[Esip-preserve] Stewardship Best Practices - Identifiers

Curt Tilmes Curt.Tilmes at nasa.gov
Thu Oct 7 10:41:02 EDT 2010


On 10/07/2010 10:23 AM, alicebarkstrom at frontier.com wrote:
> The alternative is to be able to verify that files hold scientifically
> identical data by computing whether the alternatives have the same
> values.

The Altman paper Ruth cited on page 19 of discusses this a bit:
http://www.springerlink.com/content/j13u6pwh837q2711/
"A Fingerprint Method for Scientific Data Verification".

Basically producing a hash of a canonical representation of
the data.  Regardless of the format, the prescribed canonical
representation is the same, so the hashes are comparable.

For numerous reasons (you point out several), that isn't sufficient
for our needs, but with some more work, it could be adapted to help us
perform a similar function.


I've been working on a comparable method, taking hashes of a canonical
representation of the provenance of a file and using that as a
fingerprint to compare two files.


I think we need to work on both approaches.  Ways to identify,
distinguish and compare content and ways to identify, distinguish and
compare provenance.

Curt


More information about the Esip-preserve mailing list