[Esip-preserve] Possible Workaround for data identity non-uniqueness?

Lynnes, Christopher S. (GSFC-6102) christopher.s.lynnes at nasa.gov
Wed Oct 13 08:56:23 EDT 2010


I agree with Curt's assessment that the canonicalization has practical problems for data that has been reformatted in a way that does not affect the content.

Is there perhaps a workaround where the reformatting agent simply asserts that they are equivalent?  That is, to add a metadata attribute that says, "this file is scientifically equivalent to this other file (e.g., identified by uuid)"?  

On Oct 13, 2010, at 8:33 AM, Curt Tilmes wrote:

> You can argue that coming up with a C(x) canonicalization isn't
> practical for our data (I won't even disagree :-) I sure don't want to
> do it myself), but your paper doesn't present that argument, or even
> address the point.  Your conclusion simply assumes it is true.
> 
> As Altman demonstrates for his field, it is certainly conceivable.
> 
> I'm also not certain that we have to develop something that "applies
> to all Earth science data" to be useful.  Perhaps we can come up with
> something reasonable for a subset, for example, annotated files in one
> of the self-describing formats (HDF/NetCDF/etc.) where the annotations
> can contribute to the canonicalization process (i.e. you tag text
> fields with a property that says "case-insensitive canonicalization of
> this field will maintain scientific equivalence"

--
Dr. Christopher Lynnes    NASA/GSFC, Code 610.2, Greenbelt, MD 20771
Phone: 301-614-5185



More information about the Esip-preserve mailing list