[Esip-preserve] [Infusion] Suggestion for tech infusion activity vis a vis MEaSUREs

Lynnes, Christopher S. (GSFC-6102) christopher.s.lynnes at nasa.gov
Mon Apr 19 08:02:30 EDT 2010


On Apr 19, 2010, at 7:27 AM, Curt Tilmes wrote:

> On 04/14/2010 12:57 PM, Lynnes, Christopher S. (GSFC-6102) wrote:
>> With the complexity and diversity of some of the versioning schemes  
>> out there, I would advocate for using a DOI for each Dataset (i.e.,  
>> DataType + Version).  If a researcher used data from multiple versions  
>> of a dataset, then the citation of multiple DOIs will make that  
>> crystal clear.
> 
> [...]
> 
>> To expound on my reasoning below just a little bit, part of (or even
>> most of) the point of unique identifiers is to eliminate ambiguity.
>> Complicated versioning schemes leave enough ambiguity in (e.g. MODIS
>> versions 005 and 051--I never quite grokked the difference) that
>> they warrant DOIs at the version level, just to emphasize that they
>> are different DataType versions.
> 
> On 04/14/2010 01:09 PM, J Glassy wrote:
> 
>> for what its worth, I couldn't agree more with Chris' last two
>> emails. Resolving ambiguity up front, pro actively, has got to be
>> one of the biggest motivations for adopting unique identifiers.
> 
> 
> We still have some problems with terminology/definitions.  I'll stick
> with my definitions for now, but acknowledge that there are
> alternatives (what I call Datatype, NSIDC calls Dataset, and what I
> call Dataset they call Data Version).
> 
> I was advocating including two identifiers in the citation, one (a
> DOI) for the Datatype and one a very precise identifier (I proposed a
> PURL) for the Dataset matching its granule membership at a point in
> time (vital, I think, for Open Datasets.)
> 
> Here is my strawman example:
> 
> a) Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO,
>   Collection 2, http://purl.org/NET/MyOrg/data/FOO/2/2010-04-01T14:00:00.
> 
> What exactly are you proposing to change? To add a third identifier, a
> DOI for the Dataset in addition to the Datatype DOI?
> 
> b) Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO.
>   Collection 2, DOI: 10.12345/FOO.2,
>   http://purl.org/NET/MyOrg/data/FOO/2/2010-04-01T14:00:00.
> 
> or to replace the Datatype DOI with just a Dataset DOI?
> 
> c) Smith, John. "Some Earth Science Data", FOO, Collection 2, DOI:
>   10.12345/FOO.2, http://purl.org/NET/MyOrg/data/FOO/2/2010-04-01T14:00:00.

My preference is (c) as I think it would be less confusing to humans (non-experts in the vagaries of science algorithms and production) trying to parse it.  This group may know that "Collection 2" signifies a different dataset version, but in fact some folks use the terminology "Version 2", "Edition 2", "Algorithm 2" etc. etc. etc.  So including two DOIs to identify the dataset will confuse readers:  is it one dataset being used, or two?

> 
> 
> The inclusion of the second identifier addresses the ambiguity
> problem.  If the inclusion of the first DOI is contributing to the
> ambiguity, we could always remove the DOI entirely, relying on the
> PURL since the DOI (neither Datatype nor Dataset) is not sufficient to
> precisely identify the set of granules:
> 
> d) Smith, John. "Some Earth Science Data", FOO, Collection 2,
>   http://purl.org/NET/MyOrg/data/FOO/2/2010-04-01T14:00:00.

This may be unambiguous, but again it is difficult for humans to understand, and more importantly, it makes citation searching quite difficult, and citation searching is one of the key goals of the entire unique identifier exercise at the dataset level.
> 
> 
> It isn't feasible to use DOIs to precisely identify all sets of
> granules in an ever changing Open Data Set, so perhaps we should just
> dispense with them?
> 
> Curt
> 
> _______________________________________________
> Infusion mailing list
> Infusion at lists.sciencedatasystems.org
> http://lists.sciencedatasystems.org/mailman/listinfo/infusion_lists.sciencedatasystems.org

--
Chris Lynnes   301-614-5185   NASA/GSFC Code 610.2, B32/S130B



More information about the Esip-preserve mailing list