[Esip-preserve] Next Telecon
Kenneth.Casey at noaa.gov
Tue Nov 10 09:06:46 EST 2009
On Nov 10, 2009, at 8:47 AM, Mark A. Parsons wrote:
> Good thoughts, Ken.
> When I said we should encourage AGU journals to take the lead, I
> meant by requiring their authors to cite the data they use.
> Theoretically, they do, but in reality data are rarely cited in any
> formal way.
Good point, and I agree with that COMPLETELY. It is the
responsibility of reviewers as well. I can not count how many times
even well-documented data sets are mis-labeled and mis-referenced in
the peer reviewed literature. I am most familiar with this in the
world of sea surface temperature data sets, where I am deeply
embedded and knowledgeable about the various products out there. So,
when I read a paper I can tell right away if the authors got it right
or wrong.... and unfortunately they get it wrong and reviewers don't
catch them on it a lot.
> Nonetheless, you raise a good point about the responsibility of the
> data publishers. Which begs the question who is a data publisher?
> Or more importantly who is a trusted publisher? And how do they
> earn that trust? One way to earn trust is to provide a recommended
> citation and then by reliably providing the *exact* data that are
> cited. As you say, that is tricky for very dynamic data sets. For
> relatively static data sets, I think the citation problem is fairly
> straight forward and analogous to citing a book. We should urge
> that approach and AGU journals should require it. The data
> publisher is important here to make sure the data can be
> unambiguously retrieved by someone reading the paper, but I don't
> think data need to be held by a proper data center (or whatever) in
> order to be cited. Proper publishers should carry more weight (like
> a peer reviewed journal vs. grey lit.), but we don't want to
> discourage data citation just because the data don't live in some
> "approved" data center.
> I think we should strongly encourage the book-style citation
> approach as an interim solution while we continue to discuss and
> pursue a more rigorous approach whereby individual data files or
> granules can be specifically identified in an unambiguous and
> permanent way--the concept of open, linked data you allude to with
> your WWW analogy.
I think this is probably a good approach, though perhaps with one
caveat... given that it can take a long time for manuscripts to make
it to print, and that datasets can be dynamic, I think when citing a
data set it is important to include WHEN the citation was made. I
think you are supposed to do the same thing when citing web sites,
for the same reasons.
> Regarding DOIs, I probably don't understand them, but it seems to
> me that they provide no more than simple due diligence. They do
> nothing to keep track of the ephemeral website, if the registry is
> not updated, and they don't provide any more permanence or
> unambiguity than an organization that maintains its URLs well. For
> example, NSIDC data set URLs have been consistent and reliable
> since before DOIs were invented. It's not a big deal really, but I
> just don't see what DOIs give us other than a false sense of
> security. I'm probably missing something.
I think I see what you mean - that the infrastructure behind the DOI
is at least as important as the DOI itself. We too have fixed URLs
for every archived data set: ours have been maintained for about a
decade now since the system was put in place... and it is not an easy
thing to do, since the archive management systems that ensure those
URLs are quite robust and rely on a lot of things... everything from
technologies, to standard operating procedures of our archive
managers, to stability of the organization itself.
> Anyway, much to discuss...
> On 10 Nov 2009, at 4:50 AM, Kenneth Casey wrote:
>> On Nov 9, 2009, at 10:49 PM, Mark A. Parsons wrote:
>>> Hi all,
>>> Some thoughts on the AGU Townhall for discussion tomorrow:
>>> We are scheduled for an hour on Thursday 1930-2030 (see
>>> description below). We want to introduce the topic and get
>>> everyone thinking, but we also want to allow for discussion. I
>>> think we should allow at least 1/2 hour for discussion.
>>> Our current plan is to have Bernard introduce the AGU position
>>> statement and then have Rob or Ruth speak on ESIP activities
>>> (including the work on identifiers). We also talked about
>>> introducing specific approaches to data citation. I mentioned the
>>> IPY guidelines. Bob Cook pointed out that ORNL has a similar
>>> approach. Indeed, about a decade ago NSIDC introduced the concept
>>> to all the DAACs who supposedly adopted it across the board.
>>> Other organizations, including GBIF, Pangea, and others also have
>>> approaches. All these approaches are similar but not identical.
>>> Do we want to achieve some sort of commonality?
>> I would think the answer to that question is "yes". It seems that
>> broad adoption of data citations will be hard enough and perhaps
>> nearly impossible if there is not a single, clear, and simple way
>> to do it - one that is closely analogous to the way we cite
>> manuscripts now.
>>> I think we want AGU journals to take a lead on the issue. How can
>>> we do that? I'm happy to give a bit of an overview, but I'll need
>>> help. There are several issues that none of the approaches have
>>> fully addressed, including making citations machine
>>> understandable and capturing specific versions or subsets of data
>>> in a citation.
>> I am not sure about the journals taking the lead... I don't
>> necessarily disagree but it is also not entirely clear to me. I
>> think what you are talking about is not so much the journals
>> themselves but rather their publishers.. in other words, the
>> question is, "When it comes to data set citations, who should be
>> the authoritative entity?" Is that the question? If so, I would
>> tend to think that the answer is somewhat similar to the answer
>> you would get to the question, "When it comes to manuscript
>> citations, who is the authoritative entity?" The short answer to
>> that question is "the publishers" but that alone is not enough
>> since it leads to "who can be a publisher?" And I think the
>> answer to that question is something like, "Well, any organization
>> that can demonstrate sufficient reliability to be respected by the
>> community and accepted as a trusted entity." In short, a sort of
>> survival of the fittest approach. I haven't really thought about
>> this very deeply, so I may be missing important points, but
>> natural candidates for such community acceptance would be national
>> data centers, perhaps some universities, etc. Coming from a
>> national data center, I am sure my perspective is influenced, but
>> I do know that issues of trust, reliability, and openness are very
>> important to us (and that we don't always have that trust and so
>> must continually work to earn it and be worthy of it).
>> I think my gut-level reaction to the idea of journal publishers
>> taking the lead on data set citations stems from what I perceive
>> as a large difference in what it takes to steward a manuscript
>> over time (from submission, through verification/peer review,
>> publication, and long-term preservation) when compared to what it
>> takes to steward a data set over time. Think about the World Wide
>> Web for a moment - hyper text transfer protocol (HTTP) took off
>> with amazing speed and blazed across the world because rendering
>> text and hyperlinks on a client is a relatively straightforward
>> thing to do. Please don't think I am diminishing the achievement
>> in any way, but in comparison finding, sending, and understanding
>> data across the internet is an ongoing challenge that continues to
>> be addressed by many, many people. I think the same can be said
>> when it comes to publishing text vs. publishing data. While by no
>> means easy or perfect, the process of peer-review, publication,
>> and citation of text seems very straightforward when compared to
>> the same steps for data. For example, not too many journal
>> articles I know are updated every five minutes (like a data set
>> from a moored tropical buoy) or revised many times (like the way
>> we reprocess many satellite data sets over and over again). The
>> granularity of a manuscript seems so simple when compared to
>> selecting/defining the granularity of a "data set". Minor
>> algorithm differences can have huge impacts on a data set.
>> Imagine if the entire results and conclusions of a manuscript
>> could change dramatically if you altered the location of a comma
>> in the text.
>>> Then there is the issue of data peer-review. There are specific
>>> peer-reviewed journals devoted to data publication, such as
>>> _Earth Science Data_ and _Ecological Archives_. Personally, I
>>> think this approach is limited and even misguided, but I am
>>> probably unusual in that regard ( I don't like DOI's either).
>> Can you summarize why you don't like DOIs?
>>> Bottom line is that we have to determine what we want to
>>> accomplish out of this townhall , and the best way to get there.
>>> That's the topic for tomorrow.
>> Unfortunately I can not make the telecon, but I will look forward
>> to the email discussions!
>>> Talk soon,
>>> Peer-Reviewed Data Publication and Other Strategies to Sustain
>>> Verifiable Science
>>> Moscone West, Room 2008
>>> Cosponsored by EP, IN
>>> Objective, verifiable science requires formal, reviewed
>>> publication of both data and research results. Data publication
>>> facilitates essential scientific processes including
>>> transparency, reproducibility, documentation of uncertainty, and
>>> preservation. The AGU Council reaffirmed this fundamental
>>> responsibility in a revised position statement. Nonetheless, data
>>> publication lacks established cultural practices and quality
>>> standards for modern, complex, digital data sets. This town hall
>>> meeting will present the AGU position statement and evolving
>>> international data publication mechanisms. We seek input from all
>>> disciplines on state-of-the-art approaches for data peer-review,
>>> peer-recognition, citation, and other verification practices. The
>>> Federation of Earth Science Information Partners will publish
>>> discussion results.
>>> On 9 Nov 2009, at 11:09 AM, Ruth Duerr wrote:
>>>> Well it is unanimous. The next telecon is tomorrow at 11 am MST
>>>> (1 pm EST, 10 am PST). Agenda (with leads indicated) includes:
>>>> - EP-TOMS plans (John Moses)
>>>> - Preparations for AGU town hall (Mark Parsons)
>>>> - Draft ESIP statement on data (Ruth)
>>>> - Planning for winter ESIP meeting (Ruth)
>>>> - Others?
>>>> Esip-preserve mailing list
>>>> Esip-preserve at lists.esipfed.org
>>> Esip-preserve mailing list
>>> Esip-preserve at lists.esipfed.org
>> [NOTE: The opinions expressed in this email are those of the
>> author alone and do not necessarily reflect official NOAA,
>> Department of Commerce, or US government policy.]
>> Kenneth S. Casey, Ph.D.
>> Technical Director
>> NOAA National Oceanographic Data Center
>> 1315 East-West Highway
>> Silver Spring MD 20910
>> 301-713-3272 ext 133
[NOTE: The opinions expressed in this email are those of the author
alone and do not necessarily reflect official NOAA, Department of
Commerce, or US government policy.]
Kenneth S. Casey, Ph.D.
NOAA National Oceanographic Data Center
1315 East-West Highway
Silver Spring MD 20910
301-713-3272 ext 133
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Esip-preserve