[Esip-preserve] [FOO] Foo Project moves to Google Spreadsheets

alicebarkstrom at frontier.com alicebarkstrom at frontier.com
Tue Oct 12 09:21:59 EDT 2010


The suggestion that Ken has of embedding UUIDs in the file at least
makes them unique and independent of cryptographic digests.  That means
as long as a copy of the original file is readable, it can be uniquely
identified.  What happens after the original file becomes unreadable
(physical deterioration, software obsolescence, hardware obsolescence
being prime suspects) is not so clear.

I've no objection to trying DOI's - but what we mean by a collection
needs clarification.  I agree that Curt's simple example is very useful
and adds to our understanding of that issue.  The critical issue here
is the number of entities required for precise citation.  As a concrete
set of questions, we need to state whether a citation needs to be
sensitive to
1.  The archive from which the data have been obtained (I'm curious
      as to whether Ken's example on this will have multiple locations
      where data will be stored.)
2.  The Data Product or ESDT - meaning very generic collections
3.  The Data Source - particularly if there are multiple sources,
      such as instruments; in the case of data collections that have
      many kinds of input data, is identifying each source critical?
4.  The Version of the algorithms (as well as input parameters, such
      as calibration coefficients)
5.  A carefully selected set of files that were used in such instances
      as validation by intercomparison with a field experiment
6.  Data inside files - as might be the case where the use of the data
      involved a small geographic or temporal sample within a larger
      region
7.  Specific identification of data inside a large number of files
      (this comes up on attempts to construct records of the "solar
       constant" from which the authors of the reconstruction have
       been attempting to determine whether or not there is a trend
       in that value)
I'll note the paper on scholarly citation of quantitative data
(Altman and King) appears to me to squash all of these levels of
precision together in an unacceptable way.

I'll have some additional material - particularly on Chris' comments,
which I think begin to help provide some rigor to the questions we're
going to have to ask ourselves.

Bruce B.
----- Original Message -----
From: "Kenneth S. Casey" <Kenneth.Casey at noaa.gov>
To: "Curt Tilmes" <Curt.Tilmes at nasa.gov>
Cc: "ESIP Preservation cluster" <esip-preserve at rtpnet.org>
Sent: Tuesday, October 12, 2010 6:54:37 AM
Subject: Re: [Esip-preserve] [FOO] Foo Project moves to Google Spreadsheets


Curt - I've been quiet on this list owing to lack of time to respond, but I did want to say that I think your FOO analysis is excellent and I've been following it closely!  Thanks so much. I think this approach is (1) extremely informative and (2) very comforting... comforting because the Group for High Resolution SST ( http://ghrsst.org ), a big international effort that produces standardized SST data from multiple satellites around the world that I participate in, just decided to use UUIDs embedded in a netCDF attribute for every single granule. It also plans to use DOIs for the collections being generated by the network of data providers from around the world.  The UUID usage is part of the new version of the "GHRSST Data Specification Version 2 (GDS2) " which was just published on October 1st.  The GDS version 1 produced about 30 collections and 1.5 million netCDF files, and there will be even more GDS2 data in just the next few years... so, we'll see how well this approach works...   


Ken 







On Oct 11, 2010, at 2:52 PM, Curt Tilmes wrote: 




This is a read/only link: 

https://docs.google.com/leaf?id=0BztPCL0EZx_3NWI4OTQwN2ItMjU3OC00ZGIwLWFlNjUtNWY1OTE3MGJjNDUw&hl=en 

Mostly cut/pasted from my earlier emails. 

I did add one additional file, a "FOOLUT" lookup table that is an 
input to APP_L2.  We can change the version of the APP independently 
from the version of the LUT and explore the various provenance graphs. 

I'm attaching a basic data flow diagram.  (I also pasted an SVG 
version of this diagram into the spreadsheet, but it doesn't come 
through every browser.) 

I'm also trying to always include the "[FOO]" tag so you can filter 
your ESIP-Preserve list if you aren't following this scenario and 
want to trim down the clutter. 

So far, this simple scenario has: 

1. Shown how DOI works well to identify and locate "ESDT+Collection". 

2. Show how DOI doesn't precisely identify sets of granules. 

3. Show how UUID can be used to unambiguously refer to individual granules. 

Curt 
<flow.png> _______________________________________________ 
Esip-preserve mailing list 
Esip-preserve at lists.esipfed.org 
http://www.lists.esipfed.org/mailman/listinfo/esip-preserve 




[NOTE: The opinions expressed in this email are those of the author alone and do not necessarily reflect official NOAA, Department of Commerce, or US government policy.] 


Kenneth S. Casey, Ph.D. 
Technical Director 
NOAA National Oceanographic Data Center 
1315 East-West Highway 
Silver Spring MD 20910 
301-713-3272 ext 133 
http://www.nodc.noaa.gov/ 







_______________________________________________
Esip-preserve mailing list
Esip-preserve at lists.esipfed.org
http://www.lists.esipfed.org/mailman/listinfo/esip-preserve


More information about the Esip-preserve mailing list