[Esip-preserve] Example for discussion

Thu Oct 7 16:17:12 EDT 2010

Ok, backing off from the real world, here is a totally contrived
example scenario, for illustrative and discussion purposes:

NASA launches on 2000-01-01 the "FOO" instrument.  It captures 1
granule per month of data.  It flies for 1 year, capturing 12
granules.  (I originally had it die here, but for grins, lets say that
today is 2001-01-01, so a month from now, we'll get another granule --
This is an "Open Data Set".)

The processing flow has these ESDTs:

FOOL0 - 1 Month of Level 0 data
FOOCAL - Calibration data produced by the CAL team used to calibrate the 
data
FOOL1B - 1 Month of Level 1 calibrated data
FOOL2 - 1 Month of Level 2 data with some geophysical parameters retrieved.
FOOL3 - 1 year of gridded data.

We have 3 PGEs: FOOL1BP, FOOL2P, FOOL3P.

Our LOCALGRANULEIDs look like this:

<ESDT>.<granulenumber>  for Level 0
<ESDT>.v<collection>    for calibration
and
<ESDT>.v<collection>.<granule number> for the others

We start from these files:

FOOL0.1
FOOL0.2
...
FOOL0.12

FOOCAL.1   (we've only calibrated once)

We run FOOL1BP 12 times, inputting each of the L0 granules + the CAL
and produce these files:

FOOL1B.v1.1
FOOL1B.v1.2
...
FOOL1B.v1.12

Then we run FOOL2P 12 times on each of those, producing these:

FOOL2.v1.1
FOOL2.v1.2
...
FOOL2.v1.12

Finally, we run FOOL3P once, reading each those and producing a single
file:

FOOL3.v1.1

Now the cal guys do their magic and come up with a better FOOCAL.2, so
we do a reprocessing into collection 2, producing 25 more files:

FOOL1B.v2.1
FOOL1B.v2.2
...
FOOL1B.v2.12

FOOL2.v2.1
FOOL2.v2.2
...
FOOL2.v2.12

FOOL3.v2.1

======================================================================

Ok, all of those files go into the archive.

Now we are looking for some good identifiers.

We're proposing DOI to be used to identify "ESDT+Collection", so lets
assign some:

doi:10.9999/US/FOOL0     => FOOL0
doi:10.9999/US/FOOL1B.v1 => FOOL1B, collection 1
doi:10.9999/US/FOOL1B.v2 => FOOL1B, collection 2
doi:10.9999/US/FOOL2.v1  => FOOL2,  collection 1
doi:10.9999/US/FOOL2.v2  => FOOL2,  collection 2
doi:10.9999/US/FOOL3.v1  => FOOL3,  collection 1
doi:10.9999/US/FOOL3.v2  => FOOL3,  collection 2

Since these are "Open Data Sets", those identifiers aren't sufficient
to determine the precise set of granules used (right now, they map to
12 granules, but next month they will map to 13 granules).

What they are useful for is referring to the general set of granules,
and locating them and even (which not perfect), citing them.

For example, you can put "http://dx.doi.org/10.9999/US/FOOL2.v1" into
your browser and it will take you right to the web page at the archive
with collection level metadata about that ESDT and information on how
to download the data, etc.

If you write a paper using the L2 data, you can add
"doi:10.9999/US/FOOL2.v2" into your data citation and someone else can
follow it later back to that data.

We still have to solve several other problems, but that is a start...

I'll use this example to talk about some of the other use cases later.

Curt