[Esip-preserve] On Earth Science Data File Uniqueness

Curt Tilmes Curt.Tilmes at nasa.gov
Wed Feb 9 07:48:32 EST 2011


On 02/07/11 10:45, Bruce Barkstrom wrote:
> Here's another wrinkle: if a UUID is formed in the field but no one
> registers it someplace what good is the identifier?

UUIDs have some nice properties.  They are globally unique forever and
can be generated on the fly in the field without any central
authority.

If I get a chunk of data (from anywhere -- capture from an instrument,
generate from some data transformation, create on the fly --
anything), I want to tag it with something permanent that I can attach
to it that will follow and distinguish that chunk of data forever.

With UUID, I don't have to register anywhere (though I can, and it
would add to their usefulness), or ask for someone to make/assign me
an id, or anything.  I just make them.  It takes trivial computation
to make as many of them as I would ever need at any point of data
collection or generation.

If you don't want to make them yourself, you can even get them from
many places around the internet.

Here's one such place: http://uuid-service.appspot.com/?output=plain

They're giving them out for free!  As many as you want.  They've got
plenty, don't worry about them running out.

They aren't tied to any controlled namespace or naming authority.  You
don't have to check with anyone, or go through hours of naming
discussions or anything prior to making them.  (I would bet we spent
hundreds of man hours on MODIS naming conventions and many people
still hate them.)

If I can persistently tie that tag -- that identifier -- to that chunk
of data.  I can always tell it apart from any other chunk of data in
the world.  I can always compare two tags and tell if the two objects
are the same.

(Note, these are my "pro-UUID" arguments -- I've previously posted
"anti-UUID" arguments, which exist as well.)

> Also, I think we need to be a lot more sensitive to the kinds of
> "objects" we're trying to identify.

Agree 100%.  I think this will fall out of the "Provenance and Context
Content Standard".  We will identify and organize a comprehensive list
of "objects" and recommend for each one of them how we will identify
them.  In that hierarchy, one of the "objects" will be "data
granules", and one of the identifiers for them (there will be more
than one) should probably be UUID (based on the conclusions of the
identifiers analysis work).

Curt


More information about the Esip-preserve mailing list