[Esip-documentation] ACDD: creator, project, institution - 2

John Graybeal via Esip-documentation esip-documentation at lists.esipfed.org
Fri Sep 26 18:05:50 EDT 2014


Bob,

I'm just answering these two most recent questions for the moment, to introduce to everyone the types of use cases many of us have had to deal with. 

This is not meant to argue with your position -- in a little bit I'll offer an overall option to address that, once I have it fully laid out. So please hold off on restating your position in response, until I've had a chance to address directly. Thanks!

> Isn't it clear that creator_name is for the data creator's name?

The question that comes up for a user is "what does 'data creator' mean for our situation?" See my use case below.

If you look at the ISO Translation Notes in ACDD 1.1, you can see the same issues popping up, where creator_name is mapped to 3 different roles: citation/citedResponsibleParty role=originator, point of contact, and metadata contact. This (one name for multiple meanings) wasn't working for our use cases.

> Isn't it clear that institution and project are the creator's institution and project?

No, it isn't. Many data sets are created by folks who don't have an institution or have multiple institutions, or are created by an institution (not a user) that has nothing to do with the actual final presentation or intellectual ownership of the data set.

Here's a use case I had to deal with:

1) Someone makes an observation with their custom instrument. Call her A. Her institution is Z.
2) The data from the instrument is collected by an ocean observing system called B (without which the data could not be collected), and is published by that system. Let's call that the raw data. B doesn't have an institution (really, it is built by dozens of institutions).
3) A standard process in the ocean observing system (written by team C from institution X) takes the raw data and performs automated QC checks. Let's call that QC data.
4) A second process in the ocean observing system (written by contributor D from institution W) takes the QC data, interpolates it, and combines it with other data to create a grid. Let's call that gridded data.
5) An individual E (from institution V) takes a subset of the gridded data and publishes that as part of their paper, also submitting it back to the observatory as a reprocessed data set. 
6) The ocean observatory publishes the newly submitted, reprocessed data set. Let's call that the curated reprocessed data.
7) A republishing system (like ERDDAP), call it F, out of institution U, takes the curated reprocessed data, and re-offers it in multiple formats. Let's call that the reformatted data.

If I make a table for each of those products 1 through 7, what do you think the creator_name is for each? And is the institution always creator's institution?  

For us, these answers were not obvious from the existing names or definitions. That's what motivated us to spend this long time trying to make improvements.

John
 

On Sep 26, 2014, at 12:50, Bob Simons - NOAA Federal via Esip-documentation <esip-documentation at lists.esipfed.org> wrote:

> I should have included this formatted snippet from the original ACDD 1.0 at 
> http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html :
> 
> 
> creator_name
> The data creator's name, URL, and email. The "institution" attribute will be used if the "creator_name" attribute does not exist. 
> metadata/creator/name
> creator_url
> metadata/creator/contact at url
> creator_email
> metadata/creator/contact at email
> institution
> metadata/creator/name
> project
> The scientific project that produced the data.
> metadata/project
> Isn't it clear that creator_name is for the data creator's name?
> Isn't it clear that institution and project are the creator's institution and project?
> 
> Nan, if someone misread this and used creator_name, institution, or project for some person/group other than the creator, then that is their mistake.  There is no need to deprecate these attribute names and create new attribute names just because someone misread/misused these attributes. That is their mistake. Tell them so they can fix it.
> 
> 
> On 2014-09-26 12:14 PM, Bob Simons - NOAA Federal wrote:
>> 
>> On 2014-09-26 11:47 AM, Nan Galbraith wrote:
>>> Hi all - 
>>> 
>>> If you're using an ACDD version number in your metadata, these 
>>> updates shouldn't cause any problems for you. If you eventually 
>>> decide that there's some value in the revised spec, you can adopt 
>>> the changes, otherwise your data & code should be just fine. 
>> What about those other standards (e.g., IOOS Gliders) that have adopted ACDD 1.0 attribute names but in the future would like to add some of the new ACDD 1.3 attribute names)?
>> 
>>> 
>>> IMHO, changing the definitions of the terms in a standard is far 
>>> worse than changing the terms themselves. Creator_name was 
>>> originally defined as "data creator's name" - really no definition 
>>> at all. If we change that to add any meaning (one who originally 
>>> collected the data, or who created the data file?) we could make 
>>> metadata in existing data sets incorrect. 
>> Only if someone really misread the 1.0 definitions.
>> 
>> 1.0 says "data creator's name" not "data file creator's name". It is the person/group that created the data. 
>> You're taking an odd reading of the definition or some groups misuse of the existing name and definition and saying that that justifies deprecating the attribute.
>> 
>> And the "institution" definition is grouped with creator, so it is clearly the data creator's institution.
>> 
>> And "project" is defined in ACDD 1.0 as "The scientific project that produced the data."   (Not, e.g., the group that published the data.) It is clearly not the publisher's project.
>> 
>> So there is no need to change the existing attribute names.
>> 
>>> 
>>> Other than that, though, I see your point, and sympathize with 
>>> your reluctance to change these terms.  I also agree that we may have 
>>> changed some terms without strictly needing to,
>> Exactly.
>>> but in the case of 
>>> creator_project and creator_institution (vs project and institution) I 
>>> think that allows for documenting other projects and institutions - 
>>> e.g. the project/institution that processes, aggregates, and/or 
>>> distributes a dataset might want some visibility for their efforts. In 
>>> the original version, they got that only at the expense of the originator's 
>>> information. 
>> Fine. Then leave project and institution as is for the creator, but add publisher_project and publisher_institution.
>> 
>>> 
>>> I've seen the effect of this many times, where data collected by a PI in 
>>> my group appears on a portal with only the name and institution of 
>>> the last person to handle it. Or, when I send my data to OceanSITES for 
>>> distribution,  I'd like the OceanSITES project to be part of the metadata, 
>>> but not to remove the original information about the person and project 
>>> that collected and originally provided the data. Having creator_project 
>>> and _institution as named fields makes this information more likely to 
>>> be preserved as it should be. 
>> I understand. Leaving project and institution as is for the creator, and adding publisher_project and publisher_institution offers a viable solution.
>> 
>>> 
>>> Regards - 
>>> Nan 
>>> 
>>> 
>>> On 9/26/14 9:46 AM, Nancy Ritchey - NOAA Federal via Esip-documentation wrote: 
>>>> Bob, 
>>>> Well said!  I agree with your assessment.  We've spent many years working with our providers to use these standards appropriately allowing the use of common tools across multiple platforms and communities.  Changing the standard as proposed will have many unintentional consequences that may negate its future use.  A thoughtful, practical solution is needed. 
>>>> Nancy Ritchey 
>>>> 
>>>> ---------- Forwarded message ---------- 
>>>> From: *Bob Simons - NOAA Federal via Esip-documentation*<esip-documentation at lists.esipfed.org <mailto:esip-documentation at lists.esipfed.org>> 
>>>> Date: Thu, Sep 25, 2014 at 7:03 PM 
>>>> Subject: [Esip-documentation] ACDD: creator, project, institution 
>>>> To: John Graybeal <john.graybeal at marinexplore.com <mailto:john.graybeal at marinexplore.com>>, ESIP Documentation <esip-documentation at lists.esipfed.org <mailto:esip-documentation at lists.esipfed.org>> 
>>>> 
>>>> 
>>>> I'm sure I'm coming late to this discussion: 
>>>> 
>>>> Why does ACDD 1.3 have creator, not creator_name, like 1.0? 
>>>> Why does ACDD 1.3 have creator_project, not project, like 1.0? 
>>>> Why does ACDD 1.3 have creator_institution, not institution (which is in CF!), like 1.0? 
>>>> If you want to add creator_institution_info, why not just add institution_info? 
>>>> 
>>>> It seems like these changes are just to change to names that the new ACDD group prefers, but at a HUGE cost. 
>>>> I have 1000's of datasets that have creator_name, project, and institution attributes. 
>>>> I have written software, ERDDAP, that strongly recommends creator_name and requires institution. 
>>>> I have told numerous people and groups to follow the ACDD standard. 
>>>> Now you are breaking your own standard. 
>>>> The new ACDD group seems to think there are no consequences to changing attribute names and that it can be done just to suit the group's fancy. 
>>>> It doesn't matter if you or I think the new names are better. That is not the issue.  If you are unhappy with the old system, change the definitions to clarify the attribute's usage, don't change the attribute names. Changes that break the old standard are wrong, wrong, wrong. 
>>>> And no, saying that all attributes are optional doesn't make it okay to change the attribute's names. If ACDD says that the data creator's name is in an attribute called creator_name, then that is where it should be (last year, this year, next year, and in 50 years). 
>>>> 
>>>> --- 
>>>> Standards should be backwards compatible. 
>>>> Standards should be as stable as possible. 
>>>> ACDD should be cleaning up the definitions of existing attributes and sparingly adding new attributes that provide a place for new pieces of information, NOT changing existing attribute names. 
>>>> 
>>>> 
>>>> -- 
>>>> Sincerely, 
>>>> 
>>>> Bob Simons 
>>>> 
>>> 
>>> 
>> 
>> -- 
>> Sincerely,
>> Bob Simons 
>> IT Specialist 
>> Environmental Research Division 
>> NOAA Southwest Fisheries Science Center 
>> 1352 Lighthouse Ave 
>> Pacific Grove, CA 93950-2079 
>> Phone: (831)333-9878 (Changed 2014-08-20) 
>> Fax: (831)648-8440 
>> Email: bob.simons at noaa.gov 
>> 
>> The contents of this message are mine personally and 
>> do not necessarily reflect any position of the 
>> Government or the National Oceanic and Atmospheric 
>> Administration. 
>> <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
>> 
> 
> -- 
> Sincerely,
> Bob Simons 
> IT Specialist 
> Environmental Research Division 
> NOAA Southwest Fisheries Science Center 
> 1352 Lighthouse Ave 
> Pacific Grove, CA 93950-2079 
> Phone: (831)333-9878 (Changed 2014-08-20) 
> Fax: (831)648-8440 
> Email: bob.simons at noaa.gov 
> 
> The contents of this message are mine personally and 
> do not necessarily reflect any position of the 
> Government or the National Oceanic and Atmospheric 
> Administration. 
> <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
> 
> _______________________________________________
> Esip-documentation mailing list
> Esip-documentation at lists.esipfed.org
> http://www.lists.esipfed.org/mailman/listinfo/esip-documentation

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20140926/cb8a5ff8/attachment-0001.html>


More information about the Esip-documentation mailing list