[Esip-preserve] A Concern

Ruth Duerr rduerr at nsidc.org
Wed Sep 15 12:33:24 EDT 2010


Hi Bruce,

On Sep 14, 2010, at 9:45 AM, alicebarkstrom at verizon.net wrote:

> In the course of working on a presentation on the role of
> data formats in information preservation, I found that it
> is possible to create several different groupings of numerical
> data that have identical scientific content but that
> are different in ways that would prevent a cryptographic
> digest from identifying them as identical.  As a result,
> I'm reasonably certain that for Earth science data, files
> that have the same content are not unique in their form.
> A simple example arises from data that could be stored
> in a database but that has one instance that is normalized
> and another instance (of the same data) that is not normalized.
>  
I think this is the issue of scientific identity or equivalence that Curt has previously brought up.  I do note some interesting work being done in the social sciences on Unique Numerical Fingerprints (http://thedata.org/citation/standard) that has relevance here.  Unfortunately it looks like considerable effort needs to go into developing a "fingerprint" for each kind of data.
> I'll also note that in the likely event of transformational
> migration, it seems probable that there will be multiple
> locations that can be identified as holders of authentic
> data, although the files in the collections may be quite
> different in their layout.  LOCKSSS is, of course, a prime
> example of this dispersal of authenticity.
>  
Actually LOCKSS is interesting in this regard since while they do hold voting to ensure that all copies are the same, they do explicitly have a single source (the "authentic, authorized" version) where you can get copies from.  This is similar to NSIDC's concepts of primary archive and backup archives...  Our responsibilities are different if we are primary vs backup...
> I think this suggests some discussion is needed regarding
> what we mean by uniqueness and authenticity, as well as
> some work regarding reliability of survival of information.
>  
If you can define concrete, doable tasks that the group could tackle, that would be great!

Ruth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20100915/b4d2a150/attachment.html>


More information about the Esip-preserve mailing list