[Esip-preserve] Citations

Alice Barkstrom alicebarkstrom at verizon.net
Thu Apr 15 21:20:40 EDT 2010


I'll agree.  I found the NRC report and translated their categories
into a table (with some bulging at the seams).  Then, I added
in three categories that seem to fit - but still needed a fourth
for the operational data production (which is the one where latency
requirements force inclusion of the "perturbations" into the data
files because there isn't time to make the record homogeneous).

Also, I'm pretty concerned with the recording (or, in the provenance
world, the provenance "tracking") of the perturbations.  In my experience,
there's a lot of "tacit knowledge" that producers don't write down that
has a serious influence on the reproducability of data production
algorithms.  Or, to put it another way, I don't think the ATBD's are
really high-fidelity recordings of what is actually going on in the
operational algorithms.  In still another way of putting it, I expect
that the amount of time required to reconstruct an algorithm and
really ensure that it replicates what is being done operationally
is about the same amount of effort that's required to develop the
operational sofware - meaning hundreds of person-hours for some
of the operational or - more importantly - the climate data record code.

Next steps on my part will be to extend the table to create
a context for the use cases that shows such impacts as
demands on the producers and demands on the accepting
archives - with - maybe - some comments on what users
experience.  Many of the comments I get are related to what
archives experience in trying to accomodate what they get
from producers.  The user experience is often something
different from either - and often our comments are not well
supported by empirical evidence from the actual user community.
I have strong opinions along the line that the IT community wants
to do certain things because they receive "good marks" from their
colleagues - whether or not the user community benefits or not.

Bruce B.

At 04:46 PM 4/15/2010, Mark A. Parsons wrote:
>I don't think there is necessarily a direct connection between 
>Bruce's paradigms and the NSB categories. While there are often 
>parallels, Data sets can evolve through the NSB categories with use. 
>For example the IPA Permafrost Map began as a research collection 
>and then as it was improved and more consistently compiled it became 
>a community collection. Now it is the benchmark of permafrost 
>distribution and is used by multiple disciplines as a reference 
>collection. All the while, it remains in Bruce's category 2.  So 
>while some comparison to the NSB categories is instructive, it isn't exact.
>
>Cheers,
>
>-m.
>On 15 Apr 2010, at 8:13 AM, Alice Barkstrom wrote:
>
> > I suspect that the production paradigms create a collection 
> organization structure
> > that could stabilize our understanding and ensure 
> representativeness to the use
> > cases we choose.  This kind of structural work would also provide 
> a checklist that
> > could be used to make it easier to classify the kind of cases 
> we're dealing with.
> > I'll take a look at the NSB report and see if I can merge the 
> suggestion I made
> > yesterday with that categorization.
> >
> > Bruce B.
> >
> > At 06:52 PM 4/14/2010, Ruth Duerr wrote:
> >> Actually these descriptions correspond pretty well to the 
> descriptions of research, resource, and reference collections  in 
> the report NSB (National Science Board). 2005. Long-Lived Digital 
> Data Collections: Enabling Research and Education in the 21st 
> Century. Washington, DC: National Science Foundation. 87 pp. 
> despite the factor that you are talking about production approaches 
> and they are talking about types of data.
> >>
> >> Ruth
> >>
> >> On Apr 14, 2010, at 3:07 PM, Alice Barkstrom wrote:
> >>
> >> > It may be useful to deal with a simple separation of approaches to
> >> > production that incorporates the size of the groups involved:
> >> >
> >> > 1.  Single author production and publication - classic 
> sociological scenario
> >> > that has supported a great deal of previous work
> >> >
> >> > Scenario: author collects measurements, analyzes the data, and writes
> >> > up a summary paper; data may be preserved on paper, or in electronic
> >> > files; peer-review accomplished by submission of paper to journal, with
> >> > a moderate number (three to five) of referees; data 
> publication would involve
> >> > having paper or electronic copies of data accepted by a 
> library or data center
> >> >
> >> > 2.  Working group production and publication - field 
> experiment (of a variety
> >> > of different kinds) would be a typical example
> >> >
> >> > Scenario: group sets up equipment, with single person in charge of each
> >> > instrument that will collect data, management of WG done by one or two
> >> > people (PI); data from individual instruments combined and intercompared
> >> > within the group; data preserved in electronic files - which 
> may be distributed
> >> > amongst the WG; each instrument's scientist writes up a paper 
> on his or her
> >> > data; peer-review accomplished by submission of papers to a 
> journal special
> >> > issue and perhaps a special editor who selects a fair number 
> of referees;
> >> > data publication requires formal accession planning by a data 
> center owing
> >> > to the volume of data and the cost of curation
> >> >
> >> > 3.  Large-scale production and publication - "Big Science" 
> owing to the size
> >> > of the effort involved
> >> >
> >> > Scenario: instrument and producer teams selected by large scale proposal
> >> > effort - may involve one hundred to two hundred people over a 
> decade; long time
> >> > period (5 years is typical) of preparation before data 
> collection begins, including
> >> > design of production system and data production software; 
> substantial pre-collection
> >> > peer-review, including ATBDs and related algorithm outlines, 
> as well as such documentation
> >> > as coordinate transformations, data formats, calibration plans 
> and procedures, etc.;
> >> > production highly rigid, with extensive planning and 
> scheduling; periodic (two to three
> >> > times per year) science team reviews of progress - stretching 
> out over a decade or
> >> > more; multiple publications, both jointly as a team and as 
> individual contributions to
> >> > journals; multiple calibration and validation exercises in 
> support of establishing bounds
> >> > on uncertainties; peer-review may involve intercomparisons 
> with competing instruments
> >> > or data sources; data publication requires resources for 
> large-scale, special purpose
> >> > data centers owing to cost of computing resources, storage 
> resources, and curation
> >> > over long periods.
> >> >
> >> > These could be neatened up - and perhaps enumerated.  We 
> really need samples of
> >> > each different kind of scenario and group interaction.  Is it 
> worth writing these thoughts up into
> >> > a format that can go into the wiki?
> >> >
> >> > Bruce B.
> >> >
> >> >
> >> > At 04:06 PM 4/14/2010, Mark A. Parsons wrote:
> >> >> After hearing today's discussion, I thought it might be 
> useful for everyone to see the essay that Ruth and I wrote on citations.
> >> >>
> >> >> Cheers,
> >> >>
> >> >> -m.
> >> >>
> >> >>
> >> >>
> >> >> On 14 Apr 2010, at 9:38 AM, Ruth Duerr wrote:
> >> >>
> >> >> > Wednesday March 10, 1 pm MST (3 pm EST)
> >> >> > Telephone: 877-326-0011
> >> >> > Meeting #: *4917475*
> >> >> > Agenda:
> >> >> >
> >> >> > - Identifiers paper status
> >> >> > - Identifiers testbed report
> >> >> > - Status of report on AGU townhall
> >> >> > - Provenance paper status
> >> >> > - Data management recommendations status
> >> >> > - Summer ESIP meeting plans
> >> >> > _______________________________________________
> >> >> > Esip-preserve mailing list
> >> >> > Esip-preserve at lists.esipfed.org
> >> >> > http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Esip-preserve mailing list
> >> >> Esip-preserve at lists.esipfed.org
> >> >> http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
> >> >
> >> >
> >> > _______________________________________________
> >> > Esip-preserve mailing list
> >> > Esip-preserve at lists.esipfed.org
> >> > http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
> >




More information about the Esip-preserve mailing list