[Esip-citationguidelines] Esip-citationguidelines Digest, Vol 24, Issue 3

Tue Dec 22 14:42:02 EST 2020

Hi Rob and Mark

This topic has been something that has interested me for a while – that is, in ensuring credit to the person that created the component of the dataset that I am using, particularly for data collected from the same station but over a long period of time where the PIs may change.

The best logical organisation I have seen are in these recommendations from UNVACO, which sorts out GPS/GNSS datasets into

  1.  Campaign;
  2.  Continuous (which is similar to the RDA dynamic data citation Mark mentions below);
  3.  Aggregated; and
  4.  Composite.

These further details are extracted from https://www.unavco.org/community/policies_forms/data-policy/data-policy-faq/data-policy-faq.html

•  How does UNAVCO handle GPS/GNSS dataset DOIs for campaigns vs. continuous/permanent stations vs. networks of stations and special cases where the Principal Investigators have changed over time?
For GPS/GNSS datasets (raw and RINEX data), UNAVCO publishes (assigns DOIs) for four different dataset types, all with associated data that have been archived to quality standards described above. The types are GPS/GNSS Campaign Datasets; GPS/GNSS Continuous Station Datasets; Aggregated Datasets; and Composite Datasets. The first two are considered primary dataset types. The third and fourth types are derived or secondary dataset types because they are composed of two or more datasets of the primary type.

  1.  GPS/GNSS Campaign<https://www.unavco.org/help/glossary/glossary.html#campaign> Dataset - This will be a dataset defined between UNAVCO and the Principal Investigators at the time of archiving, and generally will include observations as raw and/or RINEX data files and metadata from GPS/GNSS data collection at a number of recoverable monuments that occurred within a well-defined time window. Once archiving is complete and the DOI is assigned, there is no intention to add data to the campaign that extends the end time or otherwise modifies the data included in the DOI.

2.       GPS/GNSS Continuous Station<https://www.unavco.org/help/glossary/glossary.html#continuous%20site> Dataset Observations and metadata from GPS/GNSS raw and/or RINEX data collection at a single recoverable monument. Unlike the campaign dataset type, which is complete and unchanging through time, the Continuous Station Dataset is open ended (until the station is retired). The DOI will be associated with an increasing dataset through time; because of this aspect of this dataset type it is important when citing this data to qualify the citation with an access date of the data and the temporal window of data used in the research. See Citation Guidance<https://www.unavco.org/community/policies_forms/attribution/attribution.html#citation> for a permanent/continuous station dataset.

3.       Aggregated GPS/GNSS Datasets - These will often be an associated group of campaign datasets or a network of stations. A campaign example is the Mammoth/Mojave 1994 campaign - https://doi.org/10.7283/T57H1GGM, which consists of three individual primary datasets: Mammoth, Mojave, and Combined Sites). For permanent/continuous stations, networks or sub-networks of stations may be assigned an aggregated DOI. An example is Plutons GPS Network - https://doi.org/10.7283/T5V98697. The collection of stations aggregated does not have to be a network; in this case, the purpose of the aggregated dataset is for collecting a potentially large number of station DOIs for citing in a journal article (ie, in order to avoid citation lists containing tens or hundreds of dataset references).

4.       Composite GPS/GNSS Datasets - A composite dataset DOI is one that is comprised of two or more subset DOIs that together make up what would normally be considered to be a single dataset. The most common example is a permanent (continuous) GPS/GNSS station where the principal investigator (author) changed at a particular point in time. The existing network (Nucleus) stations that were adopted by UNAVCO as part of PBO are examples. The entire dataset is one DOI and is comprised of a separate DOI for each time period with a different author or set of authors. An example is the composite DOI for the station NOMT - https://doi.org/10.7283/T5B27SN9

Take care

Lesley

From: Esip-citationguidelines <esip-citationguidelines-bounces at lists.esipfed.org> on behalf of Mark Parsons via Esip-citationguidelines <esip-citationguidelines at lists.esipfed.org>
Reply to: Mark Parsons <parsonsm.work at icloud.com>
Date: Wednesday, 23 December 2020 at 4:49 am
To: Robert Casey <rob at iris.washington.edu>
Cc: Esip-citationguidelines <esip-citationguidelines at lists.esipfed.org>
Subject: Re: [Esip-citationguidelines] Esip-citationguidelines Digest, Vol 24, Issue 3

Hi Rob,

Different data centers take different approaches for different time series. For infrequently updated time series, it may be appropriate to assign a new PID with every update or provide periodic “snapshots”. For frequently updated data (daily or more often), data centers will often assign a PID to the general data stream and only create a new one when there is a new version of the stream. This is discussed more in the ESIP guidelines. https://doi.org/10.6084/m9.figshare.8441816.

For subsets, the RDA Dynamic Data Citation guidleines (https://dx.doi.org/10.15497/RDA00016) recommend that one provide a PID for the overall collection and then assign a PID to any arbitrary subset as obtained through a query. So a citation of a subset would have two PIDS — one for the collection and one for the subset.

None of this has much to do with credit, however. For many time series the credit is the same for the collection and the granule or subset, but in some cases different individuals may be responsible for different granules within a collection and should therefore be credited accordingly. There has been a little work showing how the RDA methodology can be used to do this, but that was not the original intent of the Recommendation.

What we are finding as we go through this exercise, is that credit is a human concern and often requires human judgement. We discuss this a bit in Parsons, M. A., R. E. Duerr, and M. B. Jones. 2019. “The history and future of data citation in practice.” Data Science Journal 18 https://doi.org/10.5334/dsj-2019-052

cheers,

-m.

On 21 Dec 2020, at 10:59, Robert Casey via Esip-citationguidelines <esip-citationguidelines at lists.esipfed.org<mailto:esip-citationguidelines at lists.esipfed.org>> wrote:

Hi Mark-

Looking at the matrix for group 2, I am wondering where datasets from DOI-designated data sources that are continuous data streams (for years many times) would fit in?  No user will access the entirety of the dataset unless it is finite in time.  Would this fall under granules?

If so, I see Rama's suggestion that each granule should be uniquely identified so as to be reproducible.  I agree that this is the ideal, but not always a practical first step.  Should we have an additional level of citation that acknowledges the data source as a whole? (absent identifying each extracted data granule).

Thank you!

-Rob

Message: 1
Date: Sun, 20 Dec 2020 14:11:42 -0700
From: Mark Parsons <parsonsm.work at icloud.com<mailto:parsonsm.work at icloud.com>>
To: Esip-citationguidelines
<esip-citationguidelines at lists.esipfed.org<mailto:esip-citationguidelines at lists.esipfed.org>>
Subject: [Esip-citationguidelines] citation cluster status
Message-ID: <DA8FA778-93B2-4F2C-9398-E71593D1F8E4 at icloud.com<mailto:DA8FA778-93B2-4F2C-9398-E71593D1F8E4 at icloud.com>>
Content-Type: text/plain; charset="utf-8"

Friends,

I have finally put some crude notes together from our November meeting: https://docs.google.com/document/d/18ooEixbchKp-qgAG7qtnebKrWDYsutt4d2eX3HsaWls/edit#<https://docs.google.com/document/d/18ooEixbchKp-qgAG7qtnebKrWDYsutt4d2eX3HsaWls/edit>

Note we have our next meeting to finalize our plans for the the winter meeting on 7 Jan. (Invite coming)

The ?Winter Mtg 2021? tab of the matrix <https://docs.google.com/spreadsheets/d/1VEYPLgTsCR_zbMUbThonBrqaYqBiMT4e525NzFi7ql8/edit?usp=sharing> shows what I?ve done.

Artifact class leads (Dan, Ruth, Sarah, Nancy, Mark) should try to complete the matrix as much as possible in advance. I tried to do it for the data cluster (Madison, please review). I hope that provides a guide.

Talk to y?all on the 7th in what we hope will be a much better year.

Merry, merry,

-m.

_______________________________________________
Esip-citationguidelines mailing list
Esip-citationguidelines at lists.esipfed.org<mailto:Esip-citationguidelines at lists.esipfed.org>
https://lists.esipfed.org/mailman/listinfo/esip-citationguidelines

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.esipfed.org/pipermail/esip-citationguidelines/attachments/20201222/8ad2c961/attachment-0001.htm>