[ESIP-all] Data Quality Summit
Robert G Raskin
Robert.G.Raskin at jpl.nasa.gov
Wed Jul 5 03:45:11 EDT 2006
This is a reminder of the Data Quality Summit to be held at the ESIP
Federation Meeting, Tuesday 7/18 from 9:30am-6pm (as part of
the Technical Workshop).
The format will include breakouts by end use (real-time
applications, education, science, Climate Data Records, etc.). The
objective is to define minimum standards for each end use.
Datasets in the Earth Information Exchange can be subsequently
classified as meeting some (or all) of these quality standards. The
many possible definitions and dimensions of data quality will be
presented in a plenary session prior to the breakouts.
A description of the breakout for Climate Data Records (which has
the most stringent quality standards of all) is presented below.
Standards for other applications are quite different and include
other criteria, such as data availability and usability for the task.
Some breakout leaders are still needed (contact me if you have an
interest in playing such a role).
-Rob
------- Climate Data Records (B. Barkstrom, J. Bates) ------
For Climate Data Records (CDR's), the scientific maturity of the
data must be caught in measures of data quality that appear as
metadata. There are three measures of maturity that appear
to be critical:
- The quantification of field variability and field variability changes
- The certification that the history of data production and
validation is accessible to public view and based on physical
understanding of the measurement process
- The visibility of the connection between the CDR and peer-
reviews of its quality and usefulness with respect to other options
for obtaining similar information
It is clear that a scientific record of climate and its variability is not
mature until there is a publically accessible understanding of that
variability. In the most primitive understanding, only lower and
upper bounds are available. At a more mature level, variability can
be described in terms of Probability Density Functions that would
allow quantitative risk assessments of various benefits and
hazards. Scientific understanding and societal benefit increasingly
depend on reliable separation of routine climate variability from
climate changes. In an early state of maturity, climate change
might appear as a linear trend. In a later state of maturity, climate
change may appear as non-linear systematic behavior with a
quantified specification of the probability of different time and
space histories.
For the second maturity measure, a data product cannot be
regarded as mature until its provenance is well-established. In
detail, this means that there needs to be a record that ties the data
to a physical understanding of the measurement process, and that
can certify how the data were produced and how much uncertainty
remains. This means that data providers will need to keep records
that establish the relationship between instrument physics and
data, and that allow interested individuals to understand the
process of validation.
Finally, our record of data quality requires that we develop the
ability to record and track considered scientific judgements of
data quality and usefulness. The mechanisms for doing so are
not currently well-developed, since they require us to consider
some of these judgements as "external metadata" that point from
the peer review back to the data. There is a clear parallel between
this situation and that in humanities libraries, where a long
tradition of scholarly comment has produced standards for
commentaries and related critical commentary. The parallel also
suggests that as this field develops, there will be a need for a
discipline of "scientific data scholarship".
More information about the ESIP-all
mailing list