[Esip-preserve] [Infusion] Suggestion for tech infusion activity vis a vis MEaSUREs

Christopher Lynnes Chris.Lynnes at nasa.gov
Wed Apr 14 13:03:33 EDT 2010


To expound on my reasoning below just a little bit, part of (or even  
most of) the point of unique identifiers is to eliminate ambiguity.   
Complicated versioning schemes leave enough ambiguity in (e.g. MODIS  
versions 005 and 051--I never quite grokked the difference) that they  
warrant DOIs at the version level, just to emphasize that they are  
different DataType versions.

On Apr 14, 2010, at 12:57 PM, Christopher Lynnes wrote:

> With the complexity and diversity of some of the versioning schemes
> out there, I would advocate for using a DOI for each Dataset (i.e.,
> DataType + Version).  If a researcher used data from multiple versions
> of a dataset, then the citation of multiple DOIs will make that
> crystal clear.
>
> On Apr 14, 2010, at 10:04 AM, Curt Tilmes wrote:
>
>> On 03/23/2010 02:35 PM, Wilson, Brian D (335G) wrote:
>>> We will need to formulate this consensus recommendation quickly.
>>>
>>> I suggest two features:
>>>
>>> 1) Publish the MEASUREs datasets as a dataset paper in an  
>>> appropriate
>>> journal so the *dataset* has a refrence-able DOI.
>>
>> We've begun to discuss/distinguish the concepts of "Data Type" (what
>> EOS call's ESDT) from "Dataset", which is a specific version (EOS
>> parlance 'Collection') of that Data Type in the ESIP Preservation
>> cluster identifiers group.
>>
>> I put some strawman terms and definitions here: (up for discussion!)
>> http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Identifiers#Definitions
>>
>> I think each of those concepts needs a referenceable identifier from
>> which we can construct data citations.
>>
>> For example, consider ESDT FOO.  It is archived in DAAC MyOrg
>> (CrossRef DOI Org 10.12345), which has archived data from ESDT FOO  
>> for
>> collection 1 (a "Closed Data Set") and is currently archiving
>> collection 2 (an "Open Data Set" still being processed from current
>> data).
>>
>> We need a citation for the general data type:
>>
>> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO.
>>
>> and a citation for each data set (each version of the data time).
>> Rather than registering a new DOI for each new version (collection),
>> I'm inclined to advise reusing the data type DOI:
>>
>> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO,
>> Collection 1.
>>
>> This "datatype DOI" could also be the 'published paper describing the
>> dataset' DOI, but I guess I'd be inclined to have separate DOIs, one
>> for the paper, and one for the datatype.  Then a paper could  
>> reference
>> either or both as appropriate to the nature of the use.
>>
>>
>> Alternatively, we could register distinct DOIs for each new version:
>>
>> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO.1,
>> Collection 1.
>>
>> For the "Open Data Set" case, I think we must precisely qualify the
>> citation to reference the specific granule membership of the dataset.
>> There are a few ways to do this, but I think the cleanest is a
>> date/time stamp:
>>
>> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO,
>> Collection 2, 2010-04-01T14:00:00.
>>
>>> 2) Serve the dataset granules from permanent (as possible) URL's
>>> from the origin sites and the receiving DAAC's.  The grabbed real
>>> estate, the root of the URL, should reference MEASUREs and the
>>> institution, and not contain the name of a computer (or something
>>> else that is dumb).
>>>
>>> 3) As far as truly permanent URI's, I don't know what to say.  I
>>> don't think either the handle system, XRI's, or any other system has
>>> gotten traction (a large market share).  This is mostly the fault of
>>> the W3C, which thinks the entire problem has been solved by existing
>>> URLs and URNs.  Hogwash.
>>
>> I like including both identifiers, datatype and dataset.  I'm leaning
>> toward using DOIs for the datatype and PURLs for the precise data
>> specification and locator:
>>
>> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO,
>> Collection 2, http://purl.org/NET/MyOrg/data/FOO/
>> 2/2010-04-01T14:00:00.
>>
>> (Though, as Ruth points out, ARKs are nice too and have their own
>> benefits.)
>>
>> Curt
>>
>> _______________________________________________
>> Infusion mailing list
>> Infusion at lists.sciencedatasystems.org
>> http://lists.sciencedatasystems.org/mailman/listinfo/infusion_lists.sciencedatasystems.org
>
> --
> Christopher Lynnes             NASA/GSFC, Code 610.2
> 301-614-5185
>
>
> _______________________________________________
> Infusion mailing list
> Infusion at lists.sciencedatasystems.org
> http://lists.sciencedatasystems.org/mailman/listinfo/infusion_lists.sciencedatasystems.org

--
Christopher Lynnes             NASA/GSFC, Code 610.2          
301-614-5185



More information about the Esip-preserve mailing list