For the linked data efforts we are converting ESIP and AGU membership data to RDF. As a result, we are creating a URI for each person. One of the challenges is that the data is noisy (John Doe appears along with J. Doe and J. M. Doe) and we need to disambiguate and create sameAs relationships. We've been exploring techniques and tools (e.g. Google Refine) that can assist in this process. We have some ideas on how to clean up the RDF data and create a unique set of identifiers for ESIP members, but it's still a work in progress.

One thing we have not considered yet is how to map to external identifiers. ORCID looks like a potential way of doing that. I'd be interested in talking with Matt and others who are familiar with this project to see how we might be able to leverage it.


This topic came up at the last ESIP Semantic Web telecon, where URIs are needed to identify people for some of our linked data efforts.  I thought either Erin, Tom Narock, or Eric Rozell had done some thinking on how to do this, at least for ESIP members...

> You might already be aware of the activities of ORCID
> http://about.orcid.org/ and its collaborators to address these issues.
>> A bunch of other groups have assigned various sets of identifiers for
>> most of the other things I'm looking at (Thank you GCMD keywords:
>> http://gcmd.nasa.gov/Resources/valids/archives/keyword_list.html)
>> Most of the databases I see seem to ignore the need to unambiguously
>> identify people.
>> Most databases simply fall back on a plain text literal and identify
>> an author as "John Doe" (or even "J. Doe").
>> I want to indicate that "John Doe" the P.I. for an instrument is the
>> same "John Doe" who authored some paper.  I need a clear, unambiguous
>> identifier for that person.
>> I could simply assign an integer as I insert into my database (I know,
>> I know -- I am not a number, I am a free man!).  Another thought is
>> UUID, even though they are big and ugly and make even bigger and
>> uglier URIs.
>> foaf:mbox_sha1sum [1] has a certain appeal since independent databases
>> have a prayer of independently assigning the same identifier to the
>> same person, but even that relies on jdoe at nasa.gov keeping the same
>> mbox_sha1sum associated with himself when he becomes johnd at noaa.gov.
>> Keeping the name itself in the URI is nice since you can look at it
>> and know who it is talking about (try that with an embedded UUID), but
>> what do you do when the 2nd (and 3rd) John Doe shows up?  Or if he
>> becomes Jane Doe?
>> Other thoughts?
>> Curt
>> [1] http://xmlns.com/foaf/spec/#term_mbox_sha1sum
