[Esip-citationguidelines] Esip-citationguidelines Digest, Vol 24, Issue 3

Tue Dec 22 16:47:35 EST 2020

Hi Rob

Yes – it is so complex.

I am doing some work on MT data in Australia, and what I think we are going to move towards is a DOI for each individual station in an MT survey/network, but that is a very confronting suggestion to some geophysicists so it is softly, softly at the moment. The idea is that when the stations are aggregated into a dataset, this also gets a DOI.

The UNAVCO people presented this work on DOIs in either the 2018 or 2019 AGU and I tried to get them to publish something on their composite and aggregate DOIs, but I can’t see that they have done this yet (it’s like trying to get IRIS to publish a referenceable paper on their ‘Dirt-to-Desktop concept – hint, hint).

This way with something like CRediT we can finally start to acknowledge the people that go out in the field and dig the holes and actually collect the data, and more importantly, recognise those who funded the data collection initiative.  Once you go into the more highly evolved data products, these people are rarely if ever citable in a machine-readable way (if you are lucky it is in free text in the acknowledgements).

Critical to this is Data Versioning and the NASA processing levels – have you see the outputs of the RDA Data Versioning Working Group? This WG produced a white paper<https://rd-alliance.org/group/data-versioning-wg/outcomes/principles-and-best-practices-data-versioning-all-data-sets-big> based on 39 use cases<https://rd-alliance.org/group/data-versioning-wg/outcomes/compilation-data-versioning-use-cases-rda-data-versioning-working>.

It would be great to know if anyone else is working on this apart from UNAVCO and Oceans Network Canada.

Take care

Lesley

From: Robert Casey <rob at iris.washington.edu>
Date: Wednesday, 23 December 2020 at 8:03 am
To: Lesley Wyborn <lesley.wyborn at anu.edu.au>
Cc: Mark Parsons <parsonsm.work at icloud.com>, Esip-citationguidelines <esip-citationguidelines at lists.esipfed.org>
Subject: Re: [Esip-citationguidelines] Esip-citationguidelines Digest, Vol 24, Issue 3

Thank you Mark and Lesley for your responses.  I like where UNAVCO was going with this as well and it's a model that they put into practice that certainly should be a set piece for discussion.

For IRIS's part, we're currently implementing Network PIDs mainly for the purpose of credit.  This covers most aspects of the Campaign Dataset and Continuous Station Dataset identifiers that UNAVCO implements.

In terms of aggregation, IRIS has the notion of Virtual Networks, which are the arbitrary collection of stations for specific periods of time.  Some represent major, governance-driven efforts, while others are convenient tags developed internally to represent a widely accepted collection.  However, I do not think there has been a consistent effort to provide DOI representations for these affiliations, though I do think they are important.

What we are not yet prepared for is the capability of providing to the investigator a single DOI for their specific custom data gather.  This would indirectly serve the needs of credit, but is really aimed at the need for reproducibility.  To get to this level, IRIS has to track the details of every request and supply versioning on its datasets to reproduce a point in time reproduction of the data as it existed when the piece was written.

All of this will take considerably more infrastructure, though it looked like Ocean Networks Canada was on the way to supporting versioned reproduction of data in their repository.  Between this as UNAVCO's works on producing aggregates, it seems we're starting to see a convergence toward full reproducibility.

-Rob

On Dec 22, 2020, at 11:42 AM, Lesley Wyborn <lesley.wyborn at anu.edu.au<mailto:lesley.wyborn at anu.edu.au>> wrote:

Hi Rob and Mark

This topic has been something that has interested me for a while – that is, in ensuring credit to the person that created the component of the dataset that I am using, particularly for data collected from the same station but over a long period of time where the PIs may change.

The best logical organisation I have seen are in these recommendations from UNVACO, which sorts out GPS/GNSS datasets into

1.       Campaign;

2.       Continuous (which is similar to the RDA dynamic data citation Mark mentions below);

3.       Aggregated; and

4.       Composite.

These further details are extracted from https://www.unavco.org/community/policies_forms/data-policy/data-policy-faq/data-policy-faq.html

•  How does UNAVCO handle GPS/GNSS dataset DOIs for campaigns vs. continuous/permanent stations vs. networks of stations and special cases where the Principal Investigators have changed over time?
For GPS/GNSS datasets (raw and RINEX data), UNAVCO publishes (assigns DOIs) for four different dataset types, all with associated data that have been archived to quality standards described above. The types are GPS/GNSS Campaign Datasets; GPS/GNSS Continuous Station Datasets; Aggregated Datasets; and Composite Datasets. The first two are considered primary dataset types. The third and fourth types are derived or secondary dataset types because they are composed of two or more datasets of the primary type.

  1.  GPS/GNSS Campaign<https://www.unavco.org/help/glossary/glossary.html#campaign> Dataset - This will be a dataset defined between UNAVCO and the Principal Investigators at the time of archiving, and generally will include observations as raw and/or RINEX data files and metadata from GPS/GNSS data collection at a number of recoverable monuments that occurred within a well-defined time window. Once archiving is complete and the DOI is assigned, there is no intention to add data to the campaign that extends the end time or otherwise modifies the data included in the DOI.
2.       GPS/GNSS Continuous Station<https://www.unavco.org/help/glossary/glossary.html#continuous%20site> Dataset Observations and metadata from GPS/GNSS raw and/or RINEX data collection at a single recoverable monument. Unlike the campaign dataset type, which is complete and unchanging through time, the Continuous Station Dataset is open ended (until the station is retired). The DOI will be associated with an increasing dataset through time; because of this aspect of this dataset type it is important when citing this data to qualify the citation with an access date of the data and the temporal window of data used in the research. See Citation Guidance<https://www.unavco.org/community/policies_forms/attribution/attribution.html#citation> for a permanent/continuous station dataset.
3.       Aggregated GPS/GNSS Datasets - These will often be an associated group of campaign datasets or a network of stations. A campaign example is the Mammoth/Mojave 1994 campaign - https://doi.org/10.7283/T57H1GGM, which consists of three individual primary datasets: Mammoth, Mojave, and Combined Sites). For permanent/continuous stations, networks or sub-networks of stations may be assigned an aggregated DOI. An example is Plutons GPS Network - https://doi.org/10.7283/T5V98697. The collection of stations aggregated does not have to be a network; in this case, the purpose of the aggregated dataset is for collecting a potentially large number of station DOIs for citing in a journal article (ie, in order to avoid citation lists containing tens or hundreds of dataset references).
4.       Composite GPS/GNSS Datasets - A composite dataset DOI is one that is comprised of two or more subset DOIs that together make up what would normally be considered to be a single dataset. The most common example is a permanent (continuous) GPS/GNSS station where the principal investigator (author) changed at a particular point in time. The existing network (Nucleus) stations that were adopted by UNAVCO as part of PBO are examples. The entire dataset is one DOI and is comprised of a separate DOI for each time period with a different author or set of authors. An example is the composite DOI for the station NOMT - https://doi.org/10.7283/T5B27SN9

Take care

Lesley

From: Esip-citationguidelines <esip-citationguidelines-bounces at lists.esipfed.org<mailto:esip-citationguidelines-bounces at lists.esipfed.org>> on behalf of Mark Parsons via Esip-citationguidelines <esip-citationguidelines at lists.esipfed.org<mailto:esip-citationguidelines at lists.esipfed.org>>
Reply to: Mark Parsons <parsonsm.work at icloud.com<mailto:parsonsm.work at icloud.com>>
Date: Wednesday, 23 December 2020 at 4:49 am
To: Robert Casey <rob at iris.washington.edu<mailto:rob at iris.washington.edu>>
Cc: Esip-citationguidelines <esip-citationguidelines at lists.esipfed.org<mailto:esip-citationguidelines at lists.esipfed.org>>
Subject: Re: [Esip-citationguidelines] Esip-citationguidelines Digest, Vol 24, Issue 3

Hi Rob,

Different data centers take different approaches for different time series. For infrequently updated time series, it may be appropriate to assign a new PID with every update or provide periodic “snapshots”. For frequently updated data (daily or more often), data centers will often assign a PID to the general data stream and only create a new one when there is a new version of the stream. This is discussed more in the ESIP guidelines. https://doi.org/10.6084/m9.figshare.8441816.

For subsets, the RDA Dynamic Data Citation guidleines (https://dx.doi.org/10.15497/RDA00016) recommend that one provide a PID for the overall collection and then assign a PID to any arbitrary subset as obtained through a query. So a citation of a subset would have two PIDS — one for the collection and one for the subset.

None of this has much to do with credit, however. For many time series the credit is the same for the collection and the granule or subset, but in some cases different individuals may be responsible for different granules within a collection and should therefore be credited accordingly. There has been a little work showing how the RDA methodology can be used to do this, but that was not the original intent of the Recommendation.

What we are finding as we go through this exercise, is that credit is a human concern and often requires human judgement. We discuss this a bit in Parsons, M. A., R. E. Duerr, and M. B. Jones. 2019. “The history and future of data citation in practice.” Data Science Journal 18 https://doi.org/10.5334/dsj-2019-052

cheers,

-m.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.esipfed.org/pipermail/esip-citationguidelines/attachments/20201222/dd42fc32/attachment-0001.htm>