[Esip-preserve] Provenance Issues

Bruce Barkstrom brbarkstrom at gmail.com
Thu Oct 18 11:21:34 EDT 2012

I was doing some work on radiosonde data from IGRA
and wanted to get the appropriate relationship between
temperature and saturation vapor pressure over liquid
water and ice.  I ran across a program listing for about
fifteen different formulas for each kind of surface.  By
and large they're all close - but since these relationships
are critical in deriving the vertical and horizontal structure
of the atmosphere, it would seem appropriate to know
exactly what's being used.  I'll note that the recommendations
of the proper formula in some of the WMO documents
had typographical errors - and the revised version didn't
entirely correct the problem, although they apparently did
get the final version right.  In examining the code for doing
the computations, it was apparent that the code itself
could use the investigator's best judgment and would
automatically insert that judgment into the calculations
without necessarily noting the change.  The decision can
be complex, particularly in dealing with temperatures
below freezing - and it's pretty important to know whether
the data are being reduced with respect to dew point
(which probably refers to liquid water) or frost point.

Here are then some "interesting" issues related to
provenance tracking and citations:

1.  Do we have a recommended practice on how to
document choices of algorithms made automatically
by computer programs during data reduction?

[Note the scaling issue here, in which the number of
radiosonde stations is fairly large, and may be much
larger when dealing with other data collections.]

2.  How should we deal with documenting changing
histories of data reduction, including recommended
formula and coefficient changes?

[In the radiosonde data, there's about a sixty year
record of daily or twice daily balloon launches, in
which the contributors range from the U.S. and
Europe from the developed countries to some of the
poorest nations of Africa.  The radiosonde models
vary and have gone through many revisions or new
technologies.  So have the equations.]

3.  The new Computer journal from the IEEE has
several articles on Dynamic Software Product Lines.
In systems that use this kind of technology, the
designers plan on variation points and allow runtime
reconfiguration of the modules.  Have we had any
thoughts on what this model of computation does
to provenance tracking?

Bruce B.
