[Esip-preserve] On Earth Science Data File Uniqueness

Curt Tilmes Curt.Tilmes at nasa.gov
Wed Feb 9 13:14:06 EST 2011


On 02/09/11 13:08, Lynnes, Christopher S. (GSFC-6102) wrote:

>Your use case may be plausible, but does not seem to be very common
>in the wild.  I suspect any recommendation to use UUIDs that goes out
>for review by us more thick-skulled practitioners is going to need a
>more compelling use case, esp. when put up against the clear
>practical benefit of checksums.  In other words, you will need to
>answer the question: why is a UUID a better unique identifier of
>contents than the contents' SHA-1 or MD5 checksum?  And what
>practical benefit does it buy me?

It is not an identifier for the content at all.  It is an identifier
for the object.

If two granules have the same object identifier, you are talking about
the same object (two copies of the same object).  If you are talking
about the same object.  If you want to verify the fixity of the
content, the UUID won't give you that.  You still need to use SHA-1 or
MD5 or whatever to verify integrity/fixity.


Some data models choose to use a hash of the content as an identifier
for the object.  We could choose to do that as well.  I think that is
a valid and useful approach.  It does however, impose certain
constraints.  It assumes that if the content of two objects is
identical then the objects are identical.  It precludes the
possibility to make two distinct objects with the same content.  If
that is an acceptable constraint, then we could propose to use one of
the digital signature schemes as our recommendations for data granule
identifiers.  Since one of our goals is reproducibility -- striving to
make data granules the same way with equivalent content -- we may be
as cross purposes with ourselves.

Curt


More information about the Esip-preserve mailing list