[Esip-preserve] [Infusion] Suggestion for tech infusion activity vis a vis MEaSUREs

Curt Tilmes Curt.Tilmes at nasa.gov
Mon Apr 19 07:27:44 EDT 2010


On 04/14/2010 12:57 PM, Lynnes, Christopher S. (GSFC-6102) wrote:
> With the complexity and diversity of some of the versioning schemes  
> out there, I would advocate for using a DOI for each Dataset (i.e.,  
> DataType + Version).  If a researcher used data from multiple versions  
> of a dataset, then the citation of multiple DOIs will make that  
> crystal clear.

[...]

> To expound on my reasoning below just a little bit, part of (or even
> most of) the point of unique identifiers is to eliminate ambiguity.
> Complicated versioning schemes leave enough ambiguity in (e.g. MODIS
> versions 005 and 051--I never quite grokked the difference) that
> they warrant DOIs at the version level, just to emphasize that they
> are different DataType versions.

On 04/14/2010 01:09 PM, J Glassy wrote:

> for what its worth, I couldn't agree more with Chris' last two
> emails. Resolving ambiguity up front, pro actively, has got to be
> one of the biggest motivations for adopting unique identifiers.


We still have some problems with terminology/definitions.  I'll stick
with my definitions for now, but acknowledge that there are
alternatives (what I call Datatype, NSIDC calls Dataset, and what I
call Dataset they call Data Version).

I was advocating including two identifiers in the citation, one (a
DOI) for the Datatype and one a very precise identifier (I proposed a
PURL) for the Dataset matching its granule membership at a point in
time (vital, I think, for Open Datasets.)

Here is my strawman example:

a) Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO,
   Collection 2, http://purl.org/NET/MyOrg/data/FOO/2/2010-04-01T14:00:00.

What exactly are you proposing to change? To add a third identifier, a
DOI for the Dataset in addition to the Datatype DOI?

b) Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO.
   Collection 2, DOI: 10.12345/FOO.2,
   http://purl.org/NET/MyOrg/data/FOO/2/2010-04-01T14:00:00.

or to replace the Datatype DOI with just a Dataset DOI?

c) Smith, John. "Some Earth Science Data", FOO, Collection 2, DOI:
   10.12345/FOO.2, http://purl.org/NET/MyOrg/data/FOO/2/2010-04-01T14:00:00.


The inclusion of the second identifier addresses the ambiguity
problem.  If the inclusion of the first DOI is contributing to the
ambiguity, we could always remove the DOI entirely, relying on the
PURL since the DOI (neither Datatype nor Dataset) is not sufficient to
precisely identify the set of granules:

d) Smith, John. "Some Earth Science Data", FOO, Collection 2,
   http://purl.org/NET/MyOrg/data/FOO/2/2010-04-01T14:00:00.


It isn't feasible to use DOIs to precisely identify all sets of
granules in an ever changing Open Data Set, so perhaps we should just
dispense with them?

Curt


More information about the Esip-preserve mailing list