[Esip-preserve] ESIP Citation Guidelines

Wed Oct 13 08:55:07 EDT 2010

On 10/12/10 08:00, Lynnes, Christopher S. (GSFC-6102) wrote:
> (4) Are Dataset A and B the same?  A: yes, if they have the same
> dataset identifier (e.g., a DOI)
> (5) Did researcher A and B use the same data from datasets A and B?
> A: much more difficult to determine

We also need to be careful about the case of "Open" or "Dynamic"
datasets where we are still updating/adding granules.

Consider the case in my FOO example.

I proposed essentially that we map DOI => ESDT+Collection, but over
time, we do add additional granules to that dataset, and in a
practical world, on occasion, remove/replace bad granules.

I showed how with that mapping, DOI alone is not enough to determine a
precise set of granules, but an identifier composed of {DOI +
Date/time stamp} could unambigiously refer to a set of granules.

That compound identifier is not unique though.  There are a lot of
Date/time stamps that would yield the same list of granules.

Do the questions you propose above warrant creating an additional
identifier that would be unique for each unique set of granules?

I'm going to keep going back to my FOO example to ground the
discussion.

Consider two researchers who download all the granules from FOOL2.002,
one on the date 2001-01-04, and one on the date 2001-01-05.  They will
get an identical list of granules.  (Since the next granule didn't get
added until 2001-02-03).

If one cites the dataset with { doi:10.9999/US/FOOL2.v2, "2001-01-04" }
and the other cites the dataset with { doi:10.9999/US/FOOL2.v2,
"2001-01-05" }, they have different information included in their
citations even though in reality they used an identical set of
granules.

You couldn't easily answer your question 4 above looking at their
dataset citations alone.

If we really wanted to have a unique dataset identifier, we could
resolve this by adding yet another hash of a canonical list of unique
granule identifiers. (Several ways to accomplish this efficiently.)

(And I haven't even gotten into the "equivalent data" morass.)

Curt