[Esip-preserve] ESIP Citation Guidelines

alicebarkstrom at frontier.com alicebarkstrom at frontier.com
Wed Oct 13 12:08:16 EDT 2010


Agreed.  Nice use of the examples.

The closest I've been able to come to "uniqueness" in this
case is our ability to identify an individual as unique, despite
changes with age (or hair color).  We do make that identification,
but it really complicates the math.

Bruce B.
----- Original Message -----
From: "Curt Tilmes" <Curt.Tilmes at nasa.gov>
To: esip-preserve at lists.esipfed.org
Sent: Wednesday, October 13, 2010 8:55:07 AM
Subject: Re: [Esip-preserve] ESIP Citation Guidelines

On 10/12/10 08:00, Lynnes, Christopher S. (GSFC-6102) wrote:
> (4) Are Dataset A and B the same?  A: yes, if they have the same
> dataset identifier (e.g., a DOI)
> (5) Did researcher A and B use the same data from datasets A and B?
> A: much more difficult to determine

We also need to be careful about the case of "Open" or "Dynamic"
datasets where we are still updating/adding granules.

Consider the case in my FOO example.

I proposed essentially that we map DOI => ESDT+Collection, but over
time, we do add additional granules to that dataset, and in a
practical world, on occasion, remove/replace bad granules.

I showed how with that mapping, DOI alone is not enough to determine a
precise set of granules, but an identifier composed of {DOI +
Date/time stamp} could unambigiously refer to a set of granules.

That compound identifier is not unique though.  There are a lot of
Date/time stamps that would yield the same list of granules.

Do the questions you propose above warrant creating an additional
identifier that would be unique for each unique set of granules?

I'm going to keep going back to my FOO example to ground the
discussion.

Consider two researchers who download all the granules from FOOL2.002,
one on the date 2001-01-04, and one on the date 2001-01-05.  They will
get an identical list of granules.  (Since the next granule didn't get
added until 2001-02-03).

If one cites the dataset with { doi:10.9999/US/FOOL2.v2, "2001-01-04" }
and the other cites the dataset with { doi:10.9999/US/FOOL2.v2,
"2001-01-05" }, they have different information included in their
citations even though in reality they used an identical set of
granules.

You couldn't easily answer your question 4 above looking at their
dataset citations alone.

If we really wanted to have a unique dataset identifier, we could
resolve this by adding yet another hash of a canonical list of unique
granule identifiers. (Several ways to accomplish this efficiently.)

(And I haven't even gotten into the "equivalent data" morass.)

Curt
_______________________________________________
Esip-preserve mailing list
Esip-preserve at lists.esipfed.org
http://www.lists.esipfed.org/mailman/listinfo/esip-preserve


More information about the Esip-preserve mailing list