[Esip-preserve] Possible Workaround for data identity non-uniqueness?
alicebarkstrom at frontier.com
alicebarkstrom at frontier.com
Wed Oct 13 12:18:00 EDT 2010
A third use case that is important for preservation:
3. The original file is lost or unreadable, although I've got
copies in a variant format and with rearrangements. I've
also got mappings that we believe demonstrate scientific
equivalence from one or more of these copies. In this case,
we might not be able to recreate an exact copy of the original
file because the process of demonstrating SEC did not preserve
some elements of the original file. For example, the original
was created in FORTRAN, while the only copy we've got is in
HDF and contains array dimensions that weren't in the original
FORTRAN.
This case can be extended to multiple mappings, where now the
intermediate copies are also lost or unreadable. Maybe the third copy
is in XML - and the tags were not part of either the first or
second copies. Note that if the chain of possible copies gets
long enough, there is likely to be the possibility of multiple
paths to establish the equivalence. This may mean that equivalence
becomes a stochastic variable where we ask "how many comparisons
do I have to make to establish that the probability of my not
having an authentic copy is below T?"
Bruce B.
----- Original Message -----
From: "Curt Tilmes" <Curt.Tilmes at nasa.gov>
To: esip-preserve at lists.esipfed.org
Sent: Wednesday, October 13, 2010 9:11:09 AM
Subject: Re: [Esip-preserve] Possible Workaround for data identity non-uniqueness?
On 10/13/10 08:56, Lynnes, Christopher S. (GSFC-6102) wrote:
> Is there perhaps a workaround where the reformatting agent simply
> asserts that they are equivalent? That is, to add a metadata
> attribute that says, "this file is scientifically equivalent to this
> other file (e.g., identified by uuid)"?
Then we have to start tagging them with "Justification" and "Trust"
facts as well...
I see (at least) two use cases we are concerned with for scientific
equivalence:
1. The reformatting case. I have data from some authoritative source,
and I want to do a transformation that maintains what we are
calling the "scientific equivalence class" (SEC).
As you propose, we could use the "authoritative souce" UUID as a
SEC identifier, and keep that with the transformed data.
My justification could be that I validated my transformation
process and assert that it does maintain that property.
2. The reproduction case. I have a granule and I want to repeat the
processing in such a way that the resulting file is in the same SEC
as the original.
My justification could be that I have replicated the processing
steps sufficiently to maintain that property.
For example, consider "process on demand" where the original file
was deleted, but the producer maintains sufficient provenance
information to re-make a new file (with a distinct UUID) that
should be in the same SEC.
Or a web service transformation. I can store a
WCS/WFS/WMS/etc. REST URL with all the parameters used to produce a
file. If I call it with those parameters and you call it with
identical parameters, we should get files in the same SEC.
Curt
_______________________________________________
Esip-preserve mailing list
Esip-preserve at lists.esipfed.org
http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
More information about the Esip-preserve
mailing list