[Esip-preserve] Some Further Comments on ESDT's/Data Products and Sampling Strategies

Sat Dec 11 13:49:37 EST 2010

After a bit of further thinking, here are a few more properties
of what I've called "Data Products"

1.  If you get a program set up that properly reads the format
for a Data Product, you should have a reasonable expectation
that the program will work on all the files in the Data Product,
getting the proper data structure represented in your computer,
which is expected to be constant (with possible variants that
the data structure reading program needs to know about and
take into account to be judged reading the files "properly).

2.  This property is very important to operational Earth science
data users because they don't want to take the time to rewrite
the read program every time they get a new one.  On the other
hand, they will usually have to write a new program for each kind
of Data Product because the sampling structure or parameters
are different.  A monthly, regional average data product will not
have locations for individual "pixels" and will not need to have
a measurement time stamp attached.  Rather it will probably
have a gridded data structure that is quite different from the
original instrument data sampling.  You can't use the same
read program on the monthly average sampling structure as
you can for the instantaneous data.

3.  This feature of Earth science data makes the objects and
collection structure different from the objects and structures
familiar in the library world - or for that matter from the new
objects and structures created by electronic book readers.
With the latter, you don't need a new program for every book
because the underlying structure of a book (as a collection
of text and figures, say) and the data format to read it are
not changed when you insert a new e-book.  A rough analogy
for data products is that each data product has a different
language - and you have to develop a translation to your
computer's word structure to read it with the original
intent.

4.  Sampling structures also carry resolution and error
propagation with them.  If we were dealing with in situ
measurements that are reasonably close to spatial
points, then the GCMD definition of the term "resolution"
might be appropriate: "the minimum distance (in time
or space) between two points".  Even here, it may be
scientifically important to know how the resolution of
measurements varies within a vertical profile or a spatial
network.  With remotely sensed measurements, resolution
is bound to the sampling, measurement error, and the
algorithms used in data interpretation.  The classic
papers on this were done by Backus and Gilbert back
in the 1960's (there are three in the Philisophical
Transactions of the Royal Society - I'll see if I can dig
out the references next week).  In the third of their
papers, they show that there is a tradeoff between precision
and resolution that depends on how the spatial structure
was sampled and the measurement noise.  In this
context, resolution varies from one point to another
and carries the meaning of how far apart measurement
features can be reliably detected.  This meaning of
resolution is also used widely in discussions of video
and audio signal processing.  It does mean that to
be careful, resolution statements for many kinds of
remotely sensed data will need the Point Spread
Function of the instrument - a key item for the data
Context in OAIS classification.

5.  For cultural relaxation, it might be interesting to
pick up a copy of Umberto Eco's "Kant and the Platypus"
as an exercise in the underlying assumptions regarding
classification.

Bruce B.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20101211/02c72360/attachment.html>