[Esip-citationguidelines] Esip-citationguidelines Digest, Vol 24, Issue 3

Tue Dec 22 18:10:14 EST 2020

	Hi Lesley-
> 
> I am doing some work on MT data in Australia, and what I think we are going to move towards is a DOI for each individual station in an MT survey/network, but that is a very confronting suggestion to some geophysicists so it is softly, softly at the moment. The idea is that when the stations are aggregated into a dataset, this also gets a DOI.  

	This certainly sounds like a good way to go, given that you have a plan for aggregating a collection of stations.  We have found ourselves very reliant on organized governance to get DOIs for networks in place (it's their network, not ours or the Federation's) so just getting that level of registration has been a task.  Referencing stations happens indirectly through our own database lookups (and of other federated repositories), so the information is there, even as we don't place a PID on the instrument itself.
 
> The UNAVCO people presented this work on DOIs in either the 2018 or 2019 AGU and I tried to get them to publish something on their composite and aggregate DOIs, but I can’t see that they have done this yet (it’s like trying to get IRIS to publish a referenceable paper on their ‘Dirt-to-Desktop concept – hint, hint).

	I think you may find that these efforts have been 'work in progress' and have seen some success, but perhaps not yet ready to publish as a solved solution.  I couldn't find a direct reference, either.  Indeed, the new IRIS MT facility is in the process of designing an effective dirt-to-desktop workflow to serve these datasets.  

	The closest to dirt-to-desktop that we have for seismic is our PH5 dataset pipeline for temporary experiments.

https://www.passcal.nmt.edu/content/ph5-what-it <https://www.passcal.nmt.edu/content/ph5-what-it>
http://service.iris.edu/ph5ws/ <http://service.iris.edu/ph5ws/>

	This involves a partnership between the instrumentation center, the data repository, and the PI to provide tools that allow the PI to largely author and maintain their datasets independently and convey them to the data repository after a series of validation checks.  The main idea is to have less of a middle-process that stands between the PI and getting data pushed out for dissemination.  It should be noted that we are taking what we have learned from PH5 and exploring an evolution in formatting, which means that someday we will see these datasets transition away from PH5.

>  
> This way with something like CRediT we can finally start to acknowledge the people that go out in the field and dig the holes and actually collect the data, and more importantly, recognise those who funded the data collection initiative.  Once you go into the more highly evolved data products, these people are rarely if ever citable in a machine-readable way (if you are lucky it is in free text in the acknowledgements).

	One thing we are doing, and will continue to develop, is making it easier for scientists to get a citation for datasets based on the networks they are accessing.  Right now, it's a simple web tool and later we will make it more service-oriented.  

	https://fdsn.org/networks/citation/

	In addition, we are now inserting DOIs for networks into our XML metadata, so the attribution is carried there when the user requests it.  Our issue, now, is to engage all of the FDSN networks so ensure that they have an associated DOI.  There are many that do not or have not registered one.

>  
> Critical to this is Data Versioning and the NASA processing levels – have you see the outputs of the RDA Data Versioning Working Group? This WG produced a white paper <https://rd-alliance.org/group/data-versioning-wg/outcomes/principles-and-best-practices-data-versioning-all-data-sets-big> based on 39 use cases <https://rd-alliance.org/group/data-versioning-wg/outcomes/compilation-data-versioning-use-cases-rda-data-versioning-working>.

	I have read some of the material coming out of the RDA and looked at one of the papers you referenced here.  There are a lot of good conceptualizations here.  I still think that each data repository will see the conditions and criteria differently in terms of versioning, what constitutes a dataset, and how metadata changes affect the identity of the dataset as a whole.  In addition, the infrastructural and cooperative demands of detailed PIDs for data and organizations will be formidable for any data center to take on.  I do think we can strive for these goals in increments, though.


	-Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.esipfed.org/pipermail/esip-citationguidelines/attachments/20201222/a77a13ee/attachment.htm>