[ESIP-all] Data Quality Summit

Robert G Raskin Robert.G.Raskin at jpl.nasa.gov
Wed Jul 5 03:45:11 EDT 2006

This is a reminder of the Data Quality Summit to be held at the ESIP 
Federation Meeting, Tuesday 7/18 from 9:30am-6pm (as part of 
the Technical Workshop).

The format will include breakouts by end use (real-time 
applications, education, science, Climate Data Records, etc.).  The 
objective is to define minimum standards for each end use.  
Datasets in the Earth Information Exchange can be subsequently 
classified as meeting some (or all) of these quality standards.  The 
many possible definitions and dimensions of data quality will be 
presented in a plenary session prior to the breakouts. 

A description of the breakout for Climate Data Records (which has 
the most stringent quality standards of all) is presented below. 
Standards for other applications are quite different and include 
other criteria, such as data availability and usability for the task.  
Some breakout leaders are still needed (contact me if you have an 
interest in playing such a role).


------- Climate Data Records (B. Barkstrom, J. Bates) ------
For Climate Data Records (CDR's), the scientific maturity of the
data must be caught in measures of data quality that appear as
metadata.  There are three measures of maturity that appear
to be critical:
- The quantification of field variability and field variability changes
- The certification that the history of data production and 
validation is accessible to public view and based on physical 
understanding of the measurement process
- The visibility of the connection between the CDR and peer-
reviews of its quality and usefulness with respect to other options 
for obtaining similar information

It is clear that a scientific record of climate and its variability is not 
mature until there is a publically accessible understanding of that
variability.  In the most primitive understanding, only lower and 
upper bounds are available.  At a more mature level, variability can 
be described in terms of Probability Density Functions that would
allow quantitative risk assessments of various benefits and 
hazards. Scientific understanding and societal benefit increasingly 
depend on reliable separation of routine climate variability from 
climate changes. In an early state of maturity, climate change 
might appear as a linear trend.  In a later state of maturity, climate 
change may appear as non-linear systematic behavior with a 
quantified specification of the probability of different time and 
space histories.

For the second maturity measure, a data product cannot be
regarded as mature until its provenance is well-established.  In 
detail, this means that there needs to be a record that ties the data 
to a physical understanding of the measurement process, and that
can certify how the data were produced and how much uncertainty
remains.  This means that data providers will need to keep records
that establish the relationship between instrument physics and 
data, and that allow interested individuals to understand the 
process of validation.

Finally, our record of data quality requires that we develop the
ability to record and track considered scientific judgements of
data quality and usefulness.  The mechanisms for doing so are
not currently well-developed, since they require us to consider
some of these judgements as "external metadata" that point from
the peer review back to the data.  There is a clear parallel between
this situation and that in humanities libraries, where a long 
tradition of scholarly comment has produced standards for 
commentaries and related critical commentary.  The parallel also 
suggests that as this field develops, there will be a need for a 
discipline of "scientific data scholarship".

More information about the ESIP-all mailing list