[Esip-preserve] Data Citation Guidelines at Monday's Stewardship call

Parsons, Mark parsom3 at rpi.edu
Fri Oct 12 09:50:57 EDT 2018

Esteemed colleagues,

An item on Monday’s agenda is the revision to the data citation guidelines. Here are some more particulars.

A.  I strongly encourage y’all to contribute to the current document in progress at
We especially need real world examples!

It’s in suggestion mode so comment and suggest away.

B.  Relation to software citation effort.
ESIP’s software citation cluster is working toward a pre-AGU release and possible press release. Their current (in flux) version is here:

We’ve agreed that ultimately we want one citation/referencing/tracking scheme for all interesting science objects, but that will take time. Do we want to try and combine efforts now or should we all just proceed apace. We might want to revisit after the following discussion.

C.  Adherence with current standards
When we last did this the DataCite schema was in version 2.2. Now they’re up to 4.1. The most significant changes seemed to have occurred since version 4. I summarize below and italicize some issue we may need to discuss

DataCite Metadata Schema 4.0
Released 19 Sep 2016. Changes in this version include:

• Changing resourceTypeGeneral from optional to mandatory
• Addition of a new property: FundingReference, with subproperties funderName, funderIdentifier, awardNumber, awardURI and awardTitle. Deprecation of contributorType “funder”
• Addition of new optional subproperties for creatorName and contributorName: familyName and givenName
• Addition of a new relatedIdentifierType option “IGSN”
• Addition of a new subproperty for GeoLocation “geoLocationPolygon”, and changing the definition of the existing GeoLocation subproperties (geoLocationPoint and geoLocationBox)

Metadata Schema 4.1
Released 23 Oct 2017. Changes in this version include:

• Allowing multiple polygons per GeoLocation
• Addition of new optional “inPolygonPoint” subproperties for polygon
• Addition of new dateType “Other”
• Addition of a new resourceType “DataPaper”
• Addition of three new relationType pairs: IsDescribedBy and Describes, HasVersion and IsVersionOf, IsRequiredBy and Requires
• Addition of new subproperty for Date: dateInformation
• Addition of a new optional attribute for creatorName and ContributorName: nameType. Controlled list: personal, organizational
• Addition of a new optional “resourceTypeGeneral” attribute for relatedIdentifier. Controlled list is identical to existing resourceTypeGeneral attribute
• Addition of optional lang attribute to Rights property

D. Mapping to dialects

Ted H. did a mapping to current metadata content dialects (i.e standards), which I have put in the document. Did he get it right? Do we need to revise our mandatory content? There was some discussion in an email thread back at the end of August. Ted provided some general notes, which I copy here with some comments:

Version: I map version to edition as the definition of edition is “a particular form or version of a published text” and the definition of version is “a particular form of something differing in certain respects from an earlier form or other forms of the same type of thing”.

Title: unambiguously maps to title rather than edition

Archive or Distributor: the mapping to publisher is very citation centric, i.e. it exists in standard citations when it is a simple name of the publisher. In this case, the publisher is included in the citation. Archive or distributor is a bit different, more like a URL which is the next item. In that case, this information goes in a different part of the metadata record.

Archive or distributor is not like a URL. It is an institutional role that is providing some level of imprimatur or authority behind the data. It also gives an indication of the level of preservation. It is also an important aspect for the credit function of citation. Publisher is the wrong term, because publishers don’t preserve.

Locator, Identifier, of Distribution Media: This item is overloaded at this point. The DataCite identifier is different than the onlink or othercit. I imagine identifier will be broken out into a single field and locator will be merged with the Archive or Distributor… BTW, having an identifier for anything (resource, individual, or organization) requires an identifierType of some sort these days.

Identifier type is needed, especially if the guidelines become more generic. See also note above about DataCite compliance. I’m open to separating out these concepts, but I think they are all needed.

Editor: Can’t use the same CSDGM element for editor and author. Also, if contributor is used, must have contributor role defined somehow.

OK. What’s the fix?

Publication Place: CSDGM mapping seems right, description does not really work for DataCite.

Distributor, Associate Archive, or other Institutional Role: How is this “Distributor” different than the one listed above? It is interesting that the CSDGM mappings are different. “Othercit” is more incorrect than publish.

Yeah, the language is confusing. This was meant for the example where the formal archive data of record is held by one institution, but the data are distributed by another institution. Usually for analog/digital collections. I think we still need the row.

Data within a larger work: othercit is for other citation information about citing the resource. lworkcit cites the larger work. When we use something like RelatedIdentifier we must specify attributes (e.g. relationType) to be useful.

OK. We need this row in the table.

General note: mapping to characterStrings. When we map something to a characterString (e.g. Description) it will be difficult to use if 1) other things are there and/or no specific guidance on how to put that information in the string.

That is all for now.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.esipfed.org/pipermail/esip-preserve/attachments/20181012/2d498215/attachment.html>

More information about the Esip-preserve mailing list