[Esip-documentation] proper usage of ACDD for derived works

Fri Jan 20 21:09:19 EST 2017

Hello everyone, 

I’ve received an email of ACDD questions from Dale Robinson, which I’m copying here (along with my own take on the answers) for the experts to comment on.

> As part of my job, I make satellite data products from source datasets provided by JPL, NOAA...
> I'd like to make the metadata correct and give credit to the original data creators. However, I am confused about which attribute apply to our group verse the original data creators. I'd appreciate it if you could help me with this or point me to a person or resource that could. 
> 
> I have a rambling bunch of questions below to illustrate my confusion. 
> 
> So, for example if I take daily Pathfinder SST files from JPL and generate multi-day composites, like an 8-day composite and make it available via our ERDDAP server:
> 
> publisher attribute fields: I'm pretty sure this is us, ERD
> The person (or other entity specified by the publisher_type attribute) responsible for publishing the data file or product to users, with its current metadata and format.

Correct.

> creator attribute fields: my guess is that this is ERD too, since we are modifying the data and JPL may not wish to be blamed for what ERD did. 
> 'The person (or other creator type specified by the creator_type attribute) principally responsible for creating this data'
> 
> Or, is JPL the creator and ERD is a contributor
> contributor_name and contributor_role

ERD is the creator of the multi-day composites, so your first guess is correct. 

The ‘contributor_’ items are to indicate additional contributors, for example members of your team or other experts who participated in the conversion process. I think the argument can be made either way as to whether to include JPL in the list of contributors; to me, it seems appropriately generous to do so.

> source: Should this be info what JPL did since original data is specified in the definition (below)
> 'The method of production of the original data. If it was model-generated, source should name the model and its version, as specifically as could be useful. If it is observational, source should characterize it (e.g., " surface observation " or " radiosonde “)'

As ’source’ comes directly from the CF conventions, I went back to those definitions, and I’m still not sure. (See also the next two items.) If no one on this list sounds authoritative, I suggest you send a question to the CF conventions list (mailto:cf-metadata at cgd.ucar.edu, subscribe at http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>). Actually, I’ll do that now, just so the CF folks are aware of the question.

> institution: JPL nice original data is specified
> 'Specifies where the original data was produced’

Will be the same answer as source. I would have agreed with you, but then the history definition (next) seems to suggest otherwise.
> 
> history: where we describe modifications to the original data. 
> 'Provides an audit trail for modifications to the original data. Well-behaved generic netCDF filters will automatically append their name and the parameters with which they were invoked to the global history attribute of an input netCDF file. We recommend that each line begin with a timestamp indicating the date and time of day that the program was executed’

So the relevant CF section for source, institution, and history is http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch02s06.html#description-of-file-contents <http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch02s06.html#description-of-file-contents>. While the definitions for institution (’Specifies where the original data was produced’) is almost certainly describing the current file’s institution, the  history is ‘an audit trail for modifications to the original data’. If this file _is_ the original data, I don’t see how there can be any modifications to it yet, I would call this a CF issue (and we of ACDD carefully avoided messing with CF definitions, as I recall), so maybe that group needs to weigh in.

> Do id and naming_authority remain as with the JPL files 

These are both referring to the current data set that you produce. So ideally you would have a unique path for your ERD organization, and a unique identifier for your created data set, which together could form a IRI (was URI (was URL/URN)) that could be resolved into the page for your data set. As sort-of described below, mostly. 

> id	
> An identifier for the data set, provided by and unique within its naming authority. The combination of the "naming authority" and the "id" should be globally unique, but the id can be globally unique by itself also. IDs can be URLs, URNs, DOIs, meaningful text strings, a local key, or any other unique string of characters. The id should not include white space characters.
> 
> naming_authority	
> The organization that provides the initial id (see above) for the dataset. The naming authority should be uniquely specified by this attribute. We recommend using reverse-DNS naming for the naming authority; URIs are also acceptable. Example: 'edu.ucar.unidata’.

Hope that helps. 

John

---------------------------------------
John Graybeal
jbgraybeal at mindspring.com
linkedin: http://www.linkedin.com/in/johngraybeal/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.deltaforce.net/pipermail/esip-documentation/attachments/20170120/1a6488ef/attachment.html>