[Esip-preserve] [Infusion] TIWG Data Stewardship WG: 2012-02-02

Curt Tilmes Curt.Tilmes at nasa.gov
Thu Feb 2 11:55:27 EST 2012


On 02/02/2012 10:18 AM, Kobler, Benjamin (GSFC-5860) wrote:
> actually I'm also interested in identifiers, but I don't know what's
> going on.
>
> Is there something you can point me to that summarizes the various
> proposals out there?

(I'm going to cc: the tech infusion and esip preserve lists on this to
broaden the discussion. :-)


There has been a lot of discussion, and a number of important
developments.

We started with "data identifiers", and Ruth Duerr led a study
culminating in this paper:
http://dx.doi.org/10.1007/s12145-011-0083-6

In summary, we recommend assigning DOIs to 'collection level
datasets', and using UUIDs for granule level data.

DOIs have been used for some time quite successfully to uniquely
identify and reference scientific papers.  (Most of the prominent
journals have adopted them.)  We think a) it makes sense to cite
datasets used in research and b) using DOI to do so has lots of
advantages.

In EOS Data Model-speak, the combination of "ESDT+Collection" should
get a DOI.  (So, collection 5 of MODIS Level 2 land surface reflection
gets one DOI, and collection 6 gets a new one.)

John Moses prepared a proposal for the ESDIS project to assign DOIs to
all of the datasets under the ESDIS purview.  It has been approved and
is in the process of being implemented.

NOAA (Jeff De La Beaujardiere) has also been looking into adopting
something similar, we hope they move forward.


The ESIP Federation Assembly also approved a set of guidelines for
data citation for providers (http://bit.ly/data_citation) that
describes data identifiers and locators and their use in citations.


Next we want to move beyond data.


We would like to follow Linked Data (http://linkeddata.org/) principles
(http://www.w3.org/DesignIssues/LinkedData.html):

1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the
standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more things.


Whenever we refer to "MODIS" or "Ben Kobler" or even "NASA", we want
to use a URI with the above properties.


Here are some previous thoughts:
(http://www.lists.esipfed.org/pipermail/esip-preserve/2010/000343.html)

(Ignore the OPM stuff in there, now we are looking at the successor to
OPM, the Prov ontology: http://www.w3.org/2011/prov)


>If we hook our granules up to ESDTs which get hooked up to
>spacecraft, you can hook into thinks like dbpedia which already has a
>smattering of things that it inherited from wikipedia, things like
>"dbpedia:Moderate-resolution_Imaging_Spectroradiometer", which can
>also be found from dbpedia:MODIS:
>
>http://dbpedia.org/describe/?url=http://dbpedia.org/resource/Moderate-Resolution_Imaging_Spectroradiometer
>
>which is the same as the
>
>yago-res:Moderate-Resolution_Imaging_Spectroradiometer
>
>which the yago (http://www.mpi-inf.mpg.de/yago-naga/yago/) people
>already have defined in their ontology as a type of
>yago:EarthObservationSatellites and
>yago:SpacecraftInstruments.


Some common concepts like "MODIS" or "NASA" already exist in some
linked data databases, we want to make sure our identifiers link with
them as well. Whatever identifier scheme we come up with, we should be
able to assert equivalence (owl:sameAs) between our identifier and
dbpedia and yago, etc.

Take something like 'person'.

Here's an identifier for "Ben Kobler":
http://dblp.l3s.de/d2r/page/authors/Ben_Kobler

DBLP asserts some facts about that person, including the fact that he
has a "foaf:name" of "Ben Kobler".


The foaf project (http://www.foaf-project.org/) has developed a widely
used ontology for describing some aspects of people.  We probably want
to take advantage of portions of that, so our information about people
can be linked with other information about that person.


And more importantly, if some data shows up at NOAA, it is linked back
to references to that same data at NASA.


We could always just make something like

http://some.site.nasa.gov/people/Ben_Kobler

but I don't particularly like using names for identifiers (that sounds
weirder than it is).

If Ben Kobler changed his name to John Doe, you might end up with
something weird like asserting that
http://some.site.nasa.gov/people/Ben_Kobler had a foaf:name of "John
Doe".  You could change the URI for the resource itself, and simply
assert its equivalence with the original, but we don't really want to
do that since Cool URIs don't change.
http://www.w3.org/Provider/Style/URI.html

Also, what do we do when Ben goes to work for NOAA?  Do we keep
asserting facts about his time at NASA with
http://some.site.nasa.gov/people/Ben_Kobler?  Should NOAA assert facts
about that URI, or make their own URI for him and assert equivalence
between them?


Anyway, we'd like to design some good identifiers for all that stuff,
then push them through things like NASA ESDSWG SPG and ESIP Fed to get
everyone to use them (like we are doing with DOIs for collection level
datasets).


We also have to determine exactly what we are trying to identify.
Naturally data, people, papers, datasets, but how do we organize
more complicated things.  We started brainstorming a little here:
http://wiki.esipfed.org/index.php/Identifiers_Activity

(Please anyone, feel free to edit/propose more stuff for that list).

I also invite anyone who read this far (yes, both of you) to join
the ESIP Preservation Telecon next Wednesday:

http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/LifeCycle/Preservation_Forum/TeleconNotes/2012-02-08

for more stimulating discussion of identifiers.

Curt


More information about the Esip-preserve mailing list