[Esip-preserve] Citations

Fri Apr 16 11:24:14 EDT 2010

Hi Bruce,

So where is this table?  I couldn't find it on the wiki.

- Ruth

On Apr 15, 2010, at 7:20 PM, Alice Barkstrom wrote:

> I'll agree.  I found the NRC report and translated their categories
> into a table (with some bulging at the seams).  Then, I added
> in three categories that seem to fit - but still needed a fourth
> for the operational data production (which is the one where latency
> requirements force inclusion of the "perturbations" into the data
> files because there isn't time to make the record homogeneous).
> 
> Also, I'm pretty concerned with the recording (or, in the provenance
> world, the provenance "tracking") of the perturbations.  In my experience,
> there's a lot of "tacit knowledge" that producers don't write down that
> has a serious influence on the reproducability of data production
> algorithms.  Or, to put it another way, I don't think the ATBD's are
> really high-fidelity recordings of what is actually going on in the
> operational algorithms.  In still another way of putting it, I expect
> that the amount of time required to reconstruct an algorithm and
> really ensure that it replicates what is being done operationally
> is about the same amount of effort that's required to develop the
> operational sofware - meaning hundreds of person-hours for some
> of the operational or - more importantly - the climate data record code.
> 
> Next steps on my part will be to extend the table to create
> a context for the use cases that shows such impacts as
> demands on the producers and demands on the accepting
> archives - with - maybe - some comments on what users
> experience.  Many of the comments I get are related to what
> archives experience in trying to accomodate what they get
> from producers.  The user experience is often something
> different from either - and often our comments are not well
> supported by empirical evidence from the actual user community.
> I have strong opinions along the line that the IT community wants
> to do certain things because they receive "good marks" from their
> colleagues - whether or not the user community benefits or not.
> 
> Bruce B.
> 
> At 04:46 PM 4/15/2010, Mark A. Parsons wrote:
>> I don't think there is necessarily a direct connection between Bruce's paradigms and the NSB categories. While there are often parallels, Data sets can evolve through the NSB categories with use. For example the IPA Permafrost Map began as a research collection and then as it was improved and more consistently compiled it became a community collection. Now it is the benchmark of permafrost distribution and is used by multiple disciplines as a reference collection. All the while, it remains in Bruce's category 2.  So while some comparison to the NSB categories is instructive, it isn't exact.
>> 
>> Cheers,
>> 
>> -m.
>> On 15 Apr 2010, at 8:13 AM, Alice Barkstrom wrote:
>> 
>> > I suspect that the production paradigms create a collection organization structure
>> > that could stabilize our understanding and ensure representativeness to the use
>> > cases we choose.  This kind of structural work would also provide a checklist that
>> > could be used to make it easier to classify the kind of cases we're dealing with.
>> > I'll take a look at the NSB report and see if I can merge the suggestion I made
>> > yesterday with that categorization.
>> >
>> > Bruce B.
>> >
>> > At 06:52 PM 4/14/2010, Ruth Duerr wrote:
>> >> Actually these descriptions correspond pretty well to the descriptions of research, resource, and reference collections  in the report NSB (National Science Board). 2005. Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century. Washington, DC: National Science Foundation. 87 pp. despite the factor that you are talking about production approaches and they are talking about types of data.
>> >>
>> >> Ruth
>> >>
>> >> On Apr 14, 2010, at 3:07 PM, Alice Barkstrom wrote:
>> >>
>> >> > It may be useful to deal with a simple separation of approaches to
>> >> > production that incorporates the size of the groups involved:
>> >> >
>> >> > 1.  Single author production and publication - classic sociological scenario
>> >> > that has supported a great deal of previous work
>> >> >
>> >> > Scenario: author collects measurements, analyzes the data, and writes
>> >> > up a summary paper; data may be preserved on paper, or in electronic
>> >> > files; peer-review accomplished by submission of paper to journal, with
>> >> > a moderate number (three to five) of referees; data publication would involve
>> >> > having paper or electronic copies of data accepted by a library or data center
>> >> >
>> >> > 2.  Working group production and publication - field experiment (of a variety
>> >> > of different kinds) would be a typical example
>> >> >
>> >> > Scenario: group sets up equipment, with single person in charge of each
>> >> > instrument that will collect data, management of WG done by one or two
>> >> > people (PI); data from individual instruments combined and intercompared
>> >> > within the group; data preserved in electronic files - which may be distributed
>> >> > amongst the WG; each instrument's scientist writes up a paper on his or her
>> >> > data; peer-review accomplished by submission of papers to a journal special
>> >> > issue and perhaps a special editor who selects a fair number of referees;
>> >> > data publication requires formal accession planning by a data center owing
>> >> > to the volume of data and the cost of curation
>> >> >
>> >> > 3.  Large-scale production and publication - "Big Science" owing to the size
>> >> > of the effort involved
>> >> >
>> >> > Scenario: instrument and producer teams selected by large scale proposal
>> >> > effort - may involve one hundred to two hundred people over a decade; long time
>> >> > period (5 years is typical) of preparation before data collection begins, including
>> >> > design of production system and data production software; substantial pre-collection
>> >> > peer-review, including ATBDs and related algorithm outlines, as well as such documentation
>> >> > as coordinate transformations, data formats, calibration plans and procedures, etc.;
>> >> > production highly rigid, with extensive planning and scheduling; periodic (two to three
>> >> > times per year) science team reviews of progress - stretching out over a decade or
>> >> > more; multiple publications, both jointly as a team and as individual contributions to
>> >> > journals; multiple calibration and validation exercises in support of establishing bounds
>> >> > on uncertainties; peer-review may involve intercomparisons with competing instruments
>> >> > or data sources; data publication requires resources for large-scale, special purpose
>> >> > data centers owing to cost of computing resources, storage resources, and curation
>> >> > over long periods.
>> >> >
>> >> > These could be neatened up - and perhaps enumerated.  We really need samples of
>> >> > each different kind of scenario and group interaction.  Is it worth writing these thoughts up into
>> >> > a format that can go into the wiki?
>> >> >
>> >> > Bruce B.
>> >> >
>> >> >
>> >> > At 04:06 PM 4/14/2010, Mark A. Parsons wrote:
>> >> >> After hearing today's discussion, I thought it might be useful for everyone to see the essay that Ruth and I wrote on citations.
>> >> >>
>> >> >> Cheers,
>> >> >>
>> >> >> -m.
>> >> >>
>> >> >>
>> >> >>
>> >> >> On 14 Apr 2010, at 9:38 AM, Ruth Duerr wrote:
>> >> >>
>> >> >> > Wednesday March 10, 1 pm MST (3 pm EST)
>> >> >> > Telephone: 877-326-0011
>> >> >> > Meeting #: *4917475*
>> >> >> > Agenda:
>> >> >> >
>> >> >> > - Identifiers paper status
>> >> >> > - Identifiers testbed report
>> >> >> > - Status of report on AGU townhall
>> >> >> > - Provenance paper status
>> >> >> > - Data management recommendations status
>> >> >> > - Summer ESIP meeting plans
>> >> >> > _______________________________________________
>> >> >> > Esip-preserve mailing list
>> >> >> > Esip-preserve at lists.esipfed.org
>> >> >> > http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> Esip-preserve mailing list
>> >> >> Esip-preserve at lists.esipfed.org
>> >> >> http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Esip-preserve mailing list
>> >> > Esip-preserve at lists.esipfed.org
>> >> > http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
>> >
>