[Esip-documentation] definitive data set identification

Nan Galbraith ngalbraith at whoi.edu
Thu Jan 21 09:27:28 EST 2021


 Hi all - 

The OceanSITES data management team is hoping to solve a problem 
with identifying duplicate or secondary instances of data sets on our 
servers. We work with in situ observational data sets, which are often 
used by modelers and remote sensing systems. If these users unknowingly 
access duplicate copies of data, it may skew their results by inaccurately
weighting these data points.

We originally tried to ensure that we had only one copy of any given 
data point on our server, but that hasn't proved to be practical. Certain 
kinds of computed data sets, like PCO2 and surface fluxes, are more 
useful to end users if the files contain copies of the component observed
data variables used in their calculations. These copies may start out at a
different rate from the originals, being gridded or averaged to match the
time base of the related data, or, over time, the original data may change 
slightly, as calibrations, algorithms, or clock adjustments are updated.

My question to the documentation cluster is whether you know of
any community standards that identify a given data variable as the
authoritative or 'original' copy. I haven't encountered any kind of
standard for this, but I may not be looking in the right places. I feel
that there may be a solution related to DOIs, but ... it wouldn't be
meaningful unless our data users knew about it, and were prepared
to use it, and if we acquired a DOI for each observed variable in a
data set.

Any ideas on this would be very welcomed; we try, whenever possible, to 
adopt existing standards instead of inventing our own one-off solutions.

Thanks in advance - 
Nan Galbraith


-- 
*******************************************************
* Nan Galbraith        Information Systems Specialist *
* Upper Ocean Processes Group            Mail Stop 29 *
* Woods Hole Oceanographic Institution                *
* Woods Hole, MA 02543                 (508) 289-2444 *
*******************************************************


More information about the Esip-documentation mailing list