[Esip-preserve] On Earth Science Data File Uniqueness

Bruce Barkstrom brbarkstrom at gmail.com
Wed Feb 9 10:14:09 EST 2011


You're still missing the point of my argument.  I do not doubt that
a UUID is unique, nor do I doubt that the UUID can be cryptographically
tied to the content of a collection of bits to verify that the bits haven't
been tampered with.

The question on my mind is sort of "If you come across a file with a UUID,
who produced it?"  This isn't quite the same as the unique locator
use case in the paper, since the question isn't "which sources could
provide me with an authentic copy of a particular kind of file?"  It's
more along the line of "where did this fellow 'Bakak' come from?"
If there's no way to find the associated metadata for the file, the UUID
doesn't give any information - except that this file has a UUID with a
value "..." (assuming you can establish which bits belong to the UUID).
Maybe we should call this use case the "orphan file" problem.

Bruce B.

On Wed, Feb 9, 2011 at 8:56 AM, Curt Tilmes <Curt.Tilmes at nasa.gov> wrote:

> On 02/09/11 08:09, Bruce Barkstrom wrote:
> > The uniqueness of UUID's wasn't the question.  The point was that if
> > the generator gave out "Bob", "Bill", "Jane", and so on, if there
> > weren't a place to find out about who created the object and when,
> > the identifier is simply another bunch of digital garbage in the
> > file or the result set from the database.  In other words, the ID's
> > have a "social function".
>
> Now you're getting into provenance and metadata.  Those are also
> critical and important issues, related but distinct from
> identification and distinguishing data files.
>
> There are numerous standards for those things and there are
> conventions and standards for connecting them to the data, either
> embedded in the data file (with a rich format), or with a tag-a-long
> file, or by putting them in a database.
>
> You still need to identify and distinguish the file you just made from
> everything else in the world.  UUIDs give you a nice, easy way to do
> that.
>
> UUID can be a great primary key for the files in a database.  If you
> have your own key, it might not match someone else's key for that same
> granule, or worse, it might duplicate their key for a different
> granule.  We're trying to converge on something that everyone could
> use that would be guaranteed to be globally unique forever.  I would
> argue that is useful even if there isn't a single central database of
> every granule in the world forever.  (That would be nice, and we may
> get there sometime, but practically, I don't see that happening in the
> near future..)
>
> We put a lot of thought/effort into MODIS localgranuleids, for
> example, but we still blew up the database on at least one occasion by
> making two granules with the same localgranuleid.
>
> MODIS localgranuleids (filenames) include a bunch of basic metadata so
> they are human friendly.  They're probably pretty incoherent to
> someone totally unfamiliar with them, but with a little knowledge you
> can decipher them visually and tell what you're looking at.
> Unfortunately, that basic metadata can end up identical, and we need
> unique identifiers.  We thought we'd be smart and tacked on a
> "production time stamp".  That should always make them unique, right?
> Well in testing if you kick off a bunch of processing and happen to
> make the same granule with the same basic metadata (the stuff in the
> filename) at the same time, you end up making two granules with the
> same localgranuleid.
>
> Today, MODIS processing is still pretty hard.  Very few people try to
> reproduce the processing done in the central system.  I'd like to make
> that easier and more accessible.  If we do that, we need a clean way
> to always distinguish the identifiers that everyone is using when they
> make the same granule the same way.  There are many complicated ways
> we could do that.  Each relies on people following some conventions,
> or centrally registering somewhere, or any one of many different
> schemes we could suggest to make unique identifiers.
>
> There's also an easy way -- just say use UUID for everything.
>
> Curt
> _______________________________________________
> Esip-preserve mailing list
> Esip-preserve at lists.esipfed.org
> http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20110209/24ff47f2/attachment-0001.html>


More information about the Esip-preserve mailing list