[Esip-preserve] ESIP Citation Guidelines

Curt Tilmes Curt.Tilmes at nasa.gov
Mon Oct 11 17:22:54 EDT 2010


On 10/11/2010 05:03 PM, alicebarkstrom at frontier.com wrote:
> At least from my perspective (probably gloomily of Scandanavian
> genetic predisposition), until we've got a threat analysis that
> moves in the direction of quantifying the probability of identifiers
> "coming loose" from the data itself, as well as the probability of
> detecting changes, and some approach to auditing for corruption, our
> job on this is far from done.

Yes.  We agree the job is far from done.

> I'll also note that I don't think we've done an adequate job of
> taking into account the difficulties of dealing with format and data
> order rearrangements.  I am quite certain that it is unfeasible to
> provide a draconian standardization of data formats and data file
> interpretations.  As a result, cryptographic digests only protect
> against tampering with the bits in a file - but they don't deal with
> the question of being able to uniquely identify two files with
> scientifically identical data that have different cryptographic
> digests (or bit-by-bit intercomparisons).

Yes, these are related to the OAIS "Fixity" requirement.
We need work there too.

Do you think it is possible to adapt the UFN approach previously
mentioned to our earth science data?  It addresses (some, but not all
of) the things you discuss here.

Additionally, I think reproducibility through complete provenance
capture helps address this (though I acknowledge it doesn't solve it).

  > This line of reasoning strongly suggests that the notion of a
  > "unique authentic version" of a file is impossible.

We have typically relied on a trusted curator to manage and affirm
this.  We can't prove it, (especially in the case of a malicious
curator), but we log cryptographic digests as we produce data, and
distribute them with the data files.  We (EOSDIS) dictate formats for
standard data products for the "authoritative" version and that is
what gets archived and distributed.  As you point out, this only
checks the physical bits.

In some cases, like the MODIS "process on demand" L1B, we can't do
that.  We assert that we have the ability to reproduce an equivalent
file (although with the current implementation, it actually performs
what I call "reprocessing" rather than "reproducing" -- The difference
being that reprocessing can use better versions of ancillary data
files, or later versions of the algorithms rather than trying to apply
a faithful attempt to make the same file.)

Curt


More information about the Esip-preserve mailing list