[Esip-preserve] [Infusion] Suggestion for tech infusion activity vis a vis MEaSUREs

Wed Apr 14 13:09:39 EDT 2010

all,

 for what its worth, I couldn't agree more with Chris' last two
emails. Resolving ambiguity up front,
pro actively, has got to be one of the biggest motivations for
adopting unique identifiers.

joe

On Wed, Apr 14, 2010 at 11:03 AM, Christopher Lynnes
<Chris.Lynnes at nasa.gov> wrote:
> To expound on my reasoning below just a little bit, part of (or even most
> of) the point of unique identifiers is to eliminate ambiguity.  Complicated
> versioning schemes leave enough ambiguity in (e.g. MODIS versions 005 and
> 051--I never quite grokked the difference) that they warrant DOIs at the
> version level, just to emphasize that they are different DataType versions.
>
> On Apr 14, 2010, at 12:57 PM, Christopher Lynnes wrote:
>
>> With the complexity and diversity of some of the versioning schemes
>> out there, I would advocate for using a DOI for each Dataset (i.e.,
>> DataType + Version).  If a researcher used data from multiple versions
>> of a dataset, then the citation of multiple DOIs will make that
>> crystal clear.
>>
>> On Apr 14, 2010, at 10:04 AM, Curt Tilmes wrote:
>>
>>> On 03/23/2010 02:35 PM, Wilson, Brian D (335G) wrote:
>>>>
>>>> We will need to formulate this consensus recommendation quickly.
>>>>
>>>> I suggest two features:
>>>>
>>>> 1) Publish the MEASUREs datasets as a dataset paper in an appropriate
>>>> journal so the *dataset* has a refrence-able DOI.
>>>
>>> We've begun to discuss/distinguish the concepts of "Data Type" (what
>>> EOS call's ESDT) from "Dataset", which is a specific version (EOS
>>> parlance 'Collection') of that Data Type in the ESIP Preservation
>>> cluster identifiers group.
>>>
>>> I put some strawman terms and definitions here: (up for discussion!)
>>>
>>> http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Identifiers#Definitions
>>>
>>> I think each of those concepts needs a referenceable identifier from
>>> which we can construct data citations.
>>>
>>> For example, consider ESDT FOO.  It is archived in DAAC MyOrg
>>> (CrossRef DOI Org 10.12345), which has archived data from ESDT FOO for
>>> collection 1 (a "Closed Data Set") and is currently archiving
>>> collection 2 (an "Open Data Set" still being processed from current
>>> data).
>>>
>>> We need a citation for the general data type:
>>>
>>> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO.
>>>
>>> and a citation for each data set (each version of the data time).
>>> Rather than registering a new DOI for each new version (collection),
>>> I'm inclined to advise reusing the data type DOI:
>>>
>>> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO,
>>> Collection 1.
>>>
>>> This "datatype DOI" could also be the 'published paper describing the
>>> dataset' DOI, but I guess I'd be inclined to have separate DOIs, one
>>> for the paper, and one for the datatype.  Then a paper could reference
>>> either or both as appropriate to the nature of the use.
>>>
>>>
>>> Alternatively, we could register distinct DOIs for each new version:
>>>
>>> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO.1,
>>> Collection 1.
>>>
>>> For the "Open Data Set" case, I think we must precisely qualify the
>>> citation to reference the specific granule membership of the dataset.
>>> There are a few ways to do this, but I think the cleanest is a
>>> date/time stamp:
>>>
>>> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO,
>>> Collection 2, 2010-04-01T14:00:00.
>>>
>>>> 2) Serve the dataset granules from permanent (as possible) URL's
>>>> from the origin sites and the receiving DAAC's.  The grabbed real
>>>> estate, the root of the URL, should reference MEASUREs and the
>>>> institution, and not contain the name of a computer (or something
>>>> else that is dumb).
>>>>
>>>> 3) As far as truly permanent URI's, I don't know what to say.  I
>>>> don't think either the handle system, XRI's, or any other system has
>>>> gotten traction (a large market share).  This is mostly the fault of
>>>> the W3C, which thinks the entire problem has been solved by existing
>>>> URLs and URNs.  Hogwash.
>>>
>>> I like including both identifiers, datatype and dataset.  I'm leaning
>>> toward using DOIs for the datatype and PURLs for the precise data
>>> specification and locator:
>>>
>>> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO,
>>> Collection 2, http://purl.org/NET/MyOrg/data/FOO/
>>> 2/2010-04-01T14:00:00.
>>>
>>> (Though, as Ruth points out, ARKs are nice too and have their own
>>> benefits.)
>>>
>>> Curt
>>>
>>> _______________________________________________
>>> Infusion mailing list
>>> Infusion at lists.sciencedatasystems.org
>>>
>>> http://lists.sciencedatasystems.org/mailman/listinfo/infusion_lists.sciencedatasystems.org
>>
>> --
>> Christopher Lynnes             NASA/GSFC, Code 610.2
>> 301-614-5185
>>
>>
>> _______________________________________________
>> Infusion mailing list
>> Infusion at lists.sciencedatasystems.org
>>
>> http://lists.sciencedatasystems.org/mailman/listinfo/infusion_lists.sciencedatasystems.org
>
> --
> Christopher Lynnes             NASA/GSFC, Code 610.2         301-614-5185
>
>
> _______________________________________________
> Infusion mailing list
> Infusion at lists.sciencedatasystems.org
> http://lists.sciencedatasystems.org/mailman/listinfo/infusion_lists.sciencedatasystems.org
>

-- 
----------------------------------------------------------------
Joseph Glassy
Lead Software Engineer (contractor)
NASA Measures (Freeze/Thaw),Rm CFC 424
College of Forestry and Conservation
Univ. Montana, Missoula, MT 59812
Tel: 406-243-6318     Cellular: 406-544-3315
and:
Research Analyst/Programmer
University of Montana NSF EPSCoR Program
Davidson Honors College Room 013
Missoula, MT 59812
um.glassy at gmail.com
Campus phone 243-6337   Cell(406) 544-3315