[Esip-preserve] Another Pleasantry - Unique Identifiers for "Jobs"

Tue Aug 17 09:21:25 EDT 2010

On 08/17/2010 08:47 AM, alicebarkstrom at verizon.net wrote:
> While we've had fun with unique identifiers for files and file
> collections, we haven't paid much attention to process
> identifiers. The math is clear: production involves a graph whose
> nodes are files and "jobs", even if we have to deal with ad hoc (or
> exploratory) production. To do provenance tracking, you have to be
> able to do a breadth first seach of the production graph, which
> means that for production history, the jobs need unique identifiers
> as much as the files (or - in some odd cases for Earth science -
> database transactions). In other words, provenance tracking is going
> to require unique identifiers for the residue of "jobs". If you
> don't have these, you can't be sure of being able to reconstruct the
> production history provenance.

There are others, but the main options are the same as for granules:

1) UUID, Assign a unique, global identifier for each instance

2) URI, (PURL/ARK/XRI/etc. something persisent that is a URI)

or both.  (Though I also like the hierarchical URN approach like SPASE
too.)

If you choose UUID, the next step is to discuss resolution, which
inevitably leads to 2.  Though I do think resolution is less important
than for files, I think it is reasonable and useful to provide it for
both jobs and files (granules).

If we just say "URI", then any of the URI-like schemes can be
accomodated.  If some organization (for whatever reason) wants to use
ARKs, and another wants to use PURLs, etc.  As long as there is a
unique, persisent, resolvable URI, it all works.

Provenance tracking within an organization is easy (not trivial, but
at least straight forward).  The bigger problem is across
organizations -- if we can address that, the local problem solution
falls out naturally.

If we like OPM (http://openprovenance.org/) [I do], then we can
recommend the XML or RDF serialization of graphs represented with that
model:

http://openprovenance.org/examples/pc1-time.xml (XML)
http://openprovenance.org/examples/pc1-time.n3  (RDF)

We're then talking about using our recommended granule identifiers in
the <opm:artifact id="..."> and our recommended job identifiers in the
<opm:process id="..."> part.  (Either UUID or URI or something else)

Curt