[Esip-preserve] Next Telecon

Thu Nov 12 07:00:39 EST 2009

I think what emerges from this conversation is a suggestion of a
realistic expectation about how "authorship" can change over time.
At a slightly more detailed level of provenance, there is also the
possibility of archives reformatting the data in a collection - no
changes in the numbers, simply a rearrangement of order.  That
would need to be tracked as well.

It would seem appropriate to record how different standards treat
this problem.  Sounds like PREMIS has hooks in place to handle
the evolution of authorship, while the ISO 19115 family of standards
is more primitive.  I don't think OAIS RM deals with this issue.
Ditto for GCMD.  It has occurred to me that there are some interesting
information preservation issues on ensuring that the software that
created a particular data set remains understandable when the original
authors of the software leave a project and custodianship moves to
an archive.  The ERBE data set for which I was essentially PI provides
a good example, in that most of the people who put together that
software are now retired.  The CERES team retains some knowledge
of the ERBE work, but I don't think there's anyone at the LaRC ASDC
who could put together the original software back together.

It also occurred to me that one of the more complex but very
important data collections to incorporate into our discussion is
the data used in meteorological reanalyses.  If I recall correctly,
much of the original data are radiosonde temperature and humidity
profiles that are edited to remove various "imperfections" in the
measurements.  There are several collections of these data,
including those at GSFC, NOAA, NCAR, and European sites.
Each collection has slightly different editing approaches and
some of the operational reanalysis sites maintain statistical
records that adds further editing to the mix.

Bruce B.

At 10:36 PM 11/11/2009, Ruth Duerr wrote:
>Hi Bruce,
>
>I've got a couple of thoughts interspersed below...
>
>Ruth
>
>On Nov 11, 2009, at 7:58 PM, Mark A. Parsons wrote:
>
>>Good thoughts as usual, Bruce. One thing we have to consider in all 
>>this is what is actually workable for the journals. This is why 
>>DOIs have taken off. Not because of their inherent merits, which 
>>are few, but because the journals like and even demand them. Most 
>>journals are very conservative (some resisted web citations into 
>>this millennium). This conservatism is warranted in many ways, but 
>>eventually it will have to yield to a new scientific publishing paradigm.
>>
>>-m.
>>On 11 Nov 2009, at 7:05 AM, Alice Barkstrom wrote:
>>
>>>One question to think about in this context is what happens when 
>>>investigation
>>>teams disband but the data are still "alive".  Who becomes the 
>>>"data publisher"?
>>>If the "data set" is replicated after the producer team disbands, 
>>>who is the authority
>>>or the "publisher"?  Note that the EOS teams are all rather 
>>>long-in-the-tooth.
>>>What happens after Terra, Aqua, and Aura die and are no longer 
>>>producing data?
>Just like with regular journals, the citation to a particular data 
>set (and here I include version within the definition) does not 
>change just because the location of the official archive has changed 
>or because the next version was created by somebody other than the 
>person/organization responsible for the original version.
>
>If somebody other than the original author reprocesses the data and 
>creates a new data set (either as a new version or as a whole new 
>data set) then they are cited for that work.
>
>I don't know how it could work in a fair manner otherwise.  This is 
>parallel to the regular publishing world where if I write the first 
>edition of a book, then I will always be the author of that edition 
>of the book, even if other people participate in writing the next 
>version of the book.  Likewise, if one publisher publishes the first 
>version they will always be the publisher of that version, even if 
>some other company publishes the next version (because perhaps the 
>second company bought out the first company or at least bought the 
>rights to publication of the next version).
>
>>>As a further note, in the ISO 19115 and related standards, there's 
>>>the notion of
>>>"Responsible Party" with a small number of roles that are 
>>>assignable, including
>>>PI.  There isn't a clear notion in the standards of how to deal 
>>>with long-term
>>>changes in the parties who become responsible, including the transition from
>>>data production to long-term stewardship.  In EOS, the data 
>>>centers often serve
>>>as "publishers", although some teams also "publish" data.  What will happen
>>>in the citations under such transitions does not seem to be 
>>>clearly spelled out
>>>as far as I can determine.
>
>Actually PREMIS explicitly allows tracking of those changes in responsibility.
>>>
>>>As a note, I was participating in a meeting last week and was 
>>>travelling home
>>>yesterday.  Sorry to miss the telecon.
>>>
>>>Bruce B.
>>>
>>>At 06:50 AM 11/10/2009, Kenneth Casey wrote:
>>>>Mark,
>>>>
>>>>
>>>>On Nov 9, 2009, at 10:49 PM, Mark A. Parsons wrote:
>>>>
>>>>>Hi all,
>>>>>
>>>>>Some thoughts on the AGU Townhall for discussion tomorrow:
>>>>>
>>>>>We are scheduled for an hour on Thursday 1930-2030 (see 
>>>>>description below). We want to introduce the topic and get 
>>>>>everyone thinking, but we also want to allow for discussion. I 
>>>>>think we should allow at least 1/2 hour for discussion.
>>>>>
>>>>>Our current plan is to have Bernard introduce the AGU position 
>>>>>statement and then have Rob or Ruth speak on ESIP activities 
>>>>>(including the work on identifiers). We also talked about 
>>>>>introducing specific approaches to data citation. I mentioned 
>>>>>the IPY guidelines. Bob Cook pointed out that ORNL has a similar 
>>>>>approach. Indeed, about a decade ago NSIDC introduced the 
>>>>>concept to all the DAACs who supposedly adopted it across the 
>>>>>board. Other organizations, including GBIF, Pangea, and others 
>>>>>also have approaches. All these approaches are similar but not 
>>>>>identical. Do we want to achieve some sort of commonality?
>>>>
>>>>I would think the answer to that question is "yes".  It seems 
>>>>that broad adoption of data citations will be hard enough and 
>>>>perhaps nearly impossible if there is not a single, clear, and 
>>>>simple way to do it - one that is closely analogous to the way we 
>>>>cite manuscripts now.
>>>>
>>>>>I think we want AGU journals to take a lead on the issue. How 
>>>>>can we do that? I'm happy to give a bit of an overview, but I'll 
>>>>>need help. There are several issues  that none of the approaches 
>>>>>have fully addressed, including making citations machine 
>>>>>understandable and capturing specific versions or subsets of 
>>>>>data in a citation.
>>>>
>>>>I am not sure about the journals taking the lead... I don't 
>>>>necessarily disagree but it is also not entirely clear to me.  I 
>>>>think what you are talking about is not so much the journals 
>>>>themselves but rather their publishers.. in other words, the 
>>>>question is, "When it comes to data set citations, who should be 
>>>>the authoritative entity?"  Is that the question?  If so, I would 
>>>>tend to think that the answer is somewhat similar to the answer 
>>>>you would get to the question, "When it comes to manuscript 
>>>>citations, who is the authoritative entity?" The short answer to 
>>>>that question is "the publishers" but that alone is not enough 
>>>>since it leads to "who can be a publisher?"  And I think the 
>>>>answer to that question is something like, "Well, any 
>>>>organization that can demonstrate sufficient reliability to be 
>>>>respected by the community and accepted as a trusted entity."  In 
>>>>short, a sort of survival of the fittest approach.  I haven't 
>>>>really thought about this very deeply, so I may be missing 
>>>>important points, but natural candidates for such community 
>>>>acceptance would be national data centers, perhaps some 
>>>>universities, etc.  Coming from a national data center, I am sure 
>>>>my perspective is influenced, but I do know that issues of trust, 
>>>>reliability, and openness are very important to us (and that we 
>>>>don't always have that trust and so must continually work to earn 
>>>>it and be worthy of it).
>>>>
>>>>I think my gut-level reaction to the idea of journal publishers 
>>>>taking the lead on data set citations stems from what I perceive 
>>>>as a large difference in what it takes to steward a manuscript 
>>>>over time (from submission, through verification/peer review, 
>>>>publication, and long-term preservation) when compared to what it 
>>>>takes to steward a data set over time.  Think about the World 
>>>>Wide Web for a moment - hyper text transfer protocol (HTTP) took 
>>>>off with amazing speed and blazed across the world because 
>>>>rendering text and hyperlinks on a client is a relatively 
>>>>straightforward thing to do. Please don't think I am diminishing 
>>>>the achievement in any way, but in comparison finding, sending, 
>>>>and understanding data across the internet is an ongoing 
>>>>challenge that continues to be addressed by many, many 
>>>>people.    I think the same can be said when it comes to 
>>>>publishing text vs. publishing data.  While by no means easy or 
>>>>perfect, the process of peer-review, publication, and citation of 
>>>>text seems very straightforward when compared to the same steps 
>>>>for data.  For example, not too many journal articles I know are 
>>>>updated every five minutes (like a data set from a moored 
>>>>tropical buoy) or revised many times (like the way we reprocess 
>>>>many satellite data sets over and over again).  The granularity 
>>>>of a manuscript seems so simple when compared to 
>>>>selecting/defining the granularity of a "data set".  Minor 
>>>>algorithm differences can have huge impacts on a data 
>>>>set.  Imagine if the entire results and conclusions of a 
>>>>manuscript could change dramatically if you altered the location 
>>>>of a comma in the text.
>>>>
>>>>
>>>>>Then there is the issue of data peer-review. There are specific 
>>>>>peer-reviewed journals devoted to data publication, such 
>>>>>as  _Earth Science Data_ and _Ecological Archives_.  Personally, 
>>>>>I think this approach is limited and even  misguided, but I am 
>>>>>probably unusual in that regard ( I don't like DOI's either).
>>>>
>>>>Can you summarize why you don't like DOIs?
>>>>
>>>>>Bottom line is that we have to determine what we want to 
>>>>>accomplish out of this townhall , and the best way to get there. 
>>>>>That's the topic for tomorrow.
>>>>
>>>>Unfortunately I can not make the telecon, but I will look forward 
>>>>to the email discussions!
>>>>
>>>>Ken
>>>>
>>>>>Talk soon,
>>>>>
>>>>>-m.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Peer-Reviewed Data Publication and Other Strategies to Sustain 
>>>>>Verifiable Science
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Moscone West, Room 2008
>>>>>Cosponsored by EP, IN
>>>>>
>>>>>Objective, verifiable science requires formal, reviewed 
>>>>>publication of both data and research results. Data publication 
>>>>>facilitates essential scientific processes including 
>>>>>transparency, reproducibility, documentation of uncertainty, and 
>>>>>preservation. The AGU Council reaffirmed this fundamental 
>>>>>responsibility in a 
>>>>><http://www.agu.org/outreach/science_policy/positions/geodata.shtml>revised 
>>>>>position statement. Nonetheless, data publication lacks 
>>>>>established cultural practices and quality standards for modern, 
>>>>>complex, digital data sets. This town hall meeting will present 
>>>>>the AGU position statement and evolving international data 
>>>>>publication mechanisms. We seek input from all disciplines on 
>>>>>state-of-the-art approaches for data peer-review, 
>>>>>peer-recognition, citation, and other verification practices. 
>>>>>The Federation of Earth Science Information Partners will 
>>>>>publish discussion results.
>>>>>
>>>>>
>>>>>
>>>>>On 9 Nov 2009, at 11:09 AM, Ruth Duerr wrote:
>>>>>
>>>>>>Well it is unanimous.  The next telecon is tomorrow at 11 am 
>>>>>>MST (1 pm EST, 10 am PST).  Agenda (with leads indicated) includes:
>>>>>>
>>>>>>- EP-TOMS plans (John Moses)
>>>>>>- Preparations for AGU town hall (Mark Parsons)
>>>>>>- Draft ESIP statement on data (Ruth)
>>>>>>- Planning for winter ESIP meeting (Ruth)
>>>>>>- Others?
>>>>>>
>>>>>>
>>>>>>_______________________________________________
>>>>>>Esip-preserve mailing list
>>>>>><mailto:Esip-preserve at lists.esipfed.org>Esip-preserve at lists.esipfed.org
>>>>>>http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
>>>>>
>>>>>_______________________________________________
>>>>>Esip-preserve mailing list
>>>>><mailto:Esip-preserve at lists.esipfed.org>Esip-preserve at lists.esipfed.org
>>>>>http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
>>>>
>>>>
>>>>[NOTE: The opinions expressed in this email are those of the 
>>>>author alone and do not necessarily reflect official NOAA, 
>>>>Department of Commerce, or US government policy.]
>>>>
>>>>Kenneth S. Casey, Ph.D.
>>>>Technical Director
>>>>NOAA National Oceanographic Data Center
>>>>1315 East-West Highway
>>>>Silver Spring MD 20910
>>>>301-713-3272 ext 133
>>>><http://www.nodc.noaa.gov>http://www.nodc.noaa.gov/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>Esip-preserve mailing list
>>>><mailto:Esip-preserve at lists.esipfed.org>Esip-preserve at lists.esipfed.org
>>>>http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
>>
>>_______________________________________________
>>Esip-preserve mailing list
>><mailto:Esip-preserve at lists.esipfed.org>Esip-preserve at lists.esipfed.org
>>http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20091112/b11b883f/attachment-0001.htm>