[Esip-documentation] Fwd: Re: ACDD: creator, project, institution - 2

Bob Simons - NOAA Federal via Esip-documentation esip-documentation at lists.esipfed.org
Mon Sep 29 10:58:35 EDT 2014


My replies are interspersed.


On 2014-09-26 3:05 PM, John Graybeal wrote:
> Bob,
>
> I'm just answering these two most recent questions for the moment, to 
> introduce to everyone the types of use cases many of us have had to 
> deal with.
>
> This is not meant to argue with your position -- in a little bit I'll 
> offer an overall option to address that, once I have it fully laid 
> out. So please hold off on restating your position in response, until 
> I've had a chance to address directly. Thanks!

>
>> Isn't it clear that creator_name is for the data creator's name?
>
> The question that comes up for a user is "what does 'data creator' 
> mean for our situation?" See my use case below.
>
> If you look at the ISO Translation Notes in ACDD 1.1, you can see the 
> same issues popping up, where creator_name is mapped to 3 different 
> roles: citation/citedResponsibleParty role=originator, point of 
> contact, and metadata contact. This (one name for multiple meanings) 
> wasn't working for our use cases.
creator_name should be mapped to originator. There is no need to change 
this attribute's name.
If you want a separate point of contact, then add data_contact_name and 
data_contact_email to ACDD 1.3.
If you want a separate metadata contact that isn't the publisher, then 
add metadata_contact_name and metadata_contact_email to ACDD 1.3.
Changing creator_name to creator does not solve the 
one-name-for-multiple-meanings problem. Adding other x_name and x_email 
does.

>
>> Isn't it clear that institution and project are the creator's 
>> institution and project?
>
> No, it isn't. Many data sets are created by folks who don't have an 
> institution or have multiple institutions, or are created by an 
> institution (not a user) that has nothing to do with the actual final 
> presentation or intellectual ownership of the data set.
Yes, it is. Seriously?!  Read and look at the definition in ACDD 1.0 at
http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html
It is very clear.
I don't care if people were confused. People will always be confused. 
Explain it to them and move on.

institution is clearly defined as the creator's institution.
If you want a separate publisher_institution, fine, add it.
Changing institution to creator_institution does not solve the problem 
of different institutions. Adding other x_institution does.

[Ah. Given all the examples, below, let me add one guideline to clarify 
the meaning of creator: I mean the creator of *this* data (to which the 
metadata is attached). If some data is significantly changed (e.g., 
beyond QA/QC, e.g., combined with other data) then it becomes a new 
product and so has a new creator.  The original creators are relegated 
to "history" (literally and figuratively) and/or external documentation. 
This isn't my idea. This is a common practice and common sense. If some 
PI at JPL creates a model combining data from several sources, he is the 
creator of the model data. Perhaps ACDD 1.3 just needs to state this.]

>
> Here's a use case I had to deal with:
My answers are for my proposal:
creator_name, project and institution all apply to data creator and 
already with that definition in ACDD 1.0.

Add: (see terms added below)

Note: Okay. I'll play this game. But if you're just going to pick this 
apart because you envision the situation slightly different than me, 
don't bother.
My point is: All these situations can be handled by attribute names 
already in ACDD 1.0 plus several new attribute names for new items.  
There is no need to deprecate ACDD 1.0 names and replace them just for 
the sake of replacing them.

>
> 1) Someone makes an observation with their custom instrument. Call her 
> A. Her institution is Z.
creator_name=A
institution=Z
> 2) The data from the instrument is collected by an ocean observing 
> system called B (without which the data could not be collected), and 
> is published by that system. Let's call that the raw data. B doesn't 
> have an institution (really, it is built by dozens of institutions).

creator_name=A or B:  (Personally, I vote for A, since B is the 
publisher, but that is certainly determined by mutual agreement between 
A and B ahead of time.)
publisher_name/url/email=from B
(If B is an Ocean Observing System with built by a big consortium, the 
OOS certainly has a name. Show me one that doesn't. More likely is that 
there are two casually cooperating groups -- fine, list them both.)
Maybe you need to pull in contributor_x from ACDD 1.0. It depends on the 
details.

> 3) A standard process in the ocean observing system (written by team C 
> from institution X) takes the raw data and performs automated QC 
> checks. Let's call that QC data.
Use processing_level from ACDD 1.0 ("A textual description of the 
processing (or quality control) level of the data.")
None of the proposals in ACDD 1.3 deal with this.
There is no need to deprecate attribute names from ACDD 1.0 to deal with 
this.

> 4) A second process in the ocean observing system (written by 
> contributor D from institution W) takes the QC data, interpolates it, 
> and combines it with other data to create a grid. Let's call that 
> gridded data.
"combines with other data" means it is a new product. So,
creator_name=D
creator_url=(if there is an external web page about how this dataset was 
created)
institution=W
history=(information about how this dataset was created)


> 5) An individual E (from institution V) takes a subset of the gridded 
> data and publishes that as part of their paper, also submitting it 
> back to the observatory as a reprocessed data set.
This is pretty vague.
If he didn't reprocess it (and he's just saying he did), the observatory 
shouldn't accept it.
If he did reprocess it:
creator_name=E
institution=V
history= details the original source data and the processing steps he took.
publisher_name/email/url=from B

> 6) The ocean observatory publishes the newly submitted, reprocessed 
> data set. Let's call that the curated reprocessed data.
This is pretty vague. It depends a lot on what "curated" means here. You 
say the published data is as E submitted it, so it sounds lightly 
curated. So:
The same answers as 5.  B is getting the same credit that a book company 
gets for helping an author publish his/her book.

> 7) A republishing system (like ERDDAP), call it F, out of institution 
> U, takes the curated reprocessed data, and re-offers it in multiple 
> formats. Let's call that the reformatted data.
Multiple formats are just different representations of the same data. 
All the data remains the same.
The ERDDAP administrator's name/address/email get added to the ISO 
metadata as the contact for the service (which doesn't replace or change 
the creator or the publisher or other information which is also in the 
ISO metadata).
ERDDAP doesn't do much (especially compared to the creator). It doesn't 
take much credit.

>
> If I make a table for each of those products 1 through 7, what do you 
> think the creator_name is for each? And is the institution always 
> creator's institution?
Yes. institution is defined as the creator's institution.
I see that I add one guideline to the definition of creator: I mean the 
creator of *this* data (to which the metadata is attached). If some data 
is significantly changed (e.g., beyond QA/QC, e.g., combined with other 
data) then it becomes a new product and so gets a new creator.  The 
original creators are relegated to "history" (literally and 
figuratively) and/or external documentation. This isn't my idea. This is 
a common practice and common sense. If some PI at JPL creates a model 
combining data from several sources, he is the creator of the model 
data. Perhaps ACDD 1.3 just needs to state this.

>
> For us, these answers were not obvious from the existing names or 
> definitions. That's what motivated us to spend this long time trying 
> to make improvements.
All of these situations seemed clear to me (although they are 
hypothetical so it is easy to pick nits based on different understanding 
of the situation). I deal these using ACDD 1.0 all of the time.

>
> John
>
> On Sep 26, 2014, at 12:50, Bob Simons - NOAA Federal via 
> Esip-documentation <esip-documentation at lists.esipfed.org 
> <mailto:esip-documentation at lists.esipfed.org>> wrote:
>
>> I should have included this formatted snippet from the original ACDD 
>> 1.0 at
>> http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html 
>> :
>>
>>
>> creator_name 
>> <http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html#creator_name_Attribute>
>> 	The data creator's name, URL, and email. The "institution" attribute 
>> will be used if the "creator_name" attribute does not exist.
>> 	metadata/creator/name
>> creator_url 
>> <http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html#creator_url_Attribute>
>> 	metadata/creator/contact at url
>> creator_email 
>> <http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html#creator_email_Attribute>
>> 	metadata/creator/contact at email
>> institution 
>> <http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html#institution_Attribute>
>> 	metadata/creator/name
>> project
>> <http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html#project_Attribute> 
>> 	The scientific project that produced the data.
>> 	metadata/project
>>
>>
>> Isn't it clear that creator_name is for the data creator's name?
>> Isn't it clear that institution and project are the creator's 
>> institution and project?
>>
>> Nan, if someone misread this and used creator_name, institution, or 
>> project for some person/group other than the creator, then that is 
>> their mistake.  There is no need to deprecate these attribute names 
>> and create new attribute names just because someone misread/misused 
>> these attributes. That is their mistake. Tell them so they can fix it.
>>
>>
>> On 2014-09-26 12:14 PM, Bob Simons - NOAA Federal wrote:
>>>
>>> On 2014-09-26 11:47 AM, Nan Galbraith wrote:
>>>> Hi all -
>>>>
>>>> If you're using an ACDD version number in your metadata, these
>>>> updates shouldn't cause any problems for you. If you eventually
>>>> decide that there's some value in the revised spec, you can adopt
>>>> the changes, otherwise your data & code should be just fine.
>>> What about those other standards (e.g., IOOS Gliders) that have 
>>> adopted ACDD 1.0 attribute names but in the future would like to add 
>>> some of the new ACDD 1.3 attribute names)?
>>>
>>>>
>>>> IMHO, changing the definitions of the terms in a standard is far
>>>> worse than changing the terms themselves. Creator_name was
>>>> originally defined as "data creator's name" - really no definition
>>>> at all. If we change that to add any meaning (one who originally
>>>> collected the data, or who created the data file?) we could make
>>>> metadata in existing data sets incorrect.
>>> Only if someone really misread the 1.0 definitions.
>>>
>>> 1.0 says "data creator's name" not "data file creator's name". It is 
>>> the person/group that created the data.
>>> You're taking an odd reading of the definition or some groups misuse 
>>> of the existing name and definition and saying that that justifies 
>>> deprecating the attribute.
>>>
>>> And the "institution" definition is grouped with creator, so it is 
>>> clearly the data creator's institution.
>>>
>>> And "project" is defined in ACDD 1.0 as "The scientific project that 
>>> produced the data."   (Not, e.g., the group that published the 
>>> data.) It is clearly not the publisher's project.
>>>
>>> So there is no need to change the existing attribute names.
>>>
>>>>
>>>> Other than that, though, I see your point, and sympathize with
>>>> your reluctance to change these terms.  I also agree that we may have
>>>> changed some terms without strictly needing to,
>>> Exactly.
>>>> but in the case of
>>>> creator_project and creator_institution (vs project and institution) I
>>>> think that allows for documenting other projects and institutions -
>>>> e.g. the project/institution that processes, aggregates, and/or
>>>> distributes a dataset might want some visibility for their efforts. In
>>>> the original version, they got that only at the expense of the 
>>>> originator's
>>>> information.
>>> Fine. Then leave project and institution as is for the creator, but 
>>> add publisher_project and publisher_institution.
>>>
>>>>
>>>> I've seen the effect of this many times, where data collected by a 
>>>> PI in
>>>> my group appears on a portal with only the name and institution of
>>>> the last person to handle it. Or, when I send my data to OceanSITES 
>>>> for
>>>> distribution,  I'd like the OceanSITES project to be part of the 
>>>> metadata,
>>>> but not to remove the original information about the person and 
>>>> project
>>>> that collected and originally provided the data. Having 
>>>> creator_project
>>>> and _institution as named fields makes this information more likely to
>>>> be preserved as it should be.
>>> I understand. Leaving project and institution as is for the creator, 
>>> and adding publisher_project and publisher_institution offers a 
>>> viable solution.
>>>
>>>>
>>>> Regards -
>>>> Nan
>>>>
>>>>
>>>> On 9/26/14 9:46 AM, Nancy Ritchey - NOAA Federal via 
>>>> Esip-documentation wrote:
>>>>> Bob,
>>>>> Well said!  I agree with your assessment.  We've spent many years 
>>>>> working with our providers to use these standards appropriately 
>>>>> allowing the use of common tools across multiple platforms and 
>>>>> communities.  Changing the standard as proposed will have many 
>>>>> unintentional consequences that may negate its future use.  A 
>>>>> thoughtful, practical solution is needed.
>>>>> Nancy Ritchey
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: *Bob Simons - NOAA Federal via 
>>>>> Esip-documentation*<esip-documentation at lists.esipfed.org 
>>>>> <mailto:esip-documentation at lists.esipfed.org>>
>>>>> Date: Thu, Sep 25, 2014 at 7:03 PM
>>>>> Subject: [Esip-documentation] ACDD: creator, project, institution
>>>>> To: John Graybeal <john.graybeal at marinexplore.com 
>>>>> <mailto:john.graybeal at marinexplore.com>>, ESIP Documentation 
>>>>> <esip-documentation at lists.esipfed.org 
>>>>> <mailto:esip-documentation at lists.esipfed.org>>
>>>>>
>>>>>
>>>>> I'm sure I'm coming late to this discussion:
>>>>>
>>>>> Why does ACDD 1.3 have creator, not creator_name, like 1.0?
>>>>> Why does ACDD 1.3 have creator_project, not project, like 1.0?
>>>>> Why does ACDD 1.3 have creator_institution, not institution (which 
>>>>> is in CF!), like 1.0?
>>>>> If you want to add creator_institution_info, why not just add 
>>>>> institution_info?
>>>>>
>>>>> It seems like these changes are just to change to names that the 
>>>>> new ACDD group prefers, but at a HUGE cost.
>>>>> I have 1000's of datasets that have creator_name, project, and 
>>>>> institution attributes.
>>>>> I have written software, ERDDAP, that strongly recommends 
>>>>> creator_name and requires institution.
>>>>> I have told numerous people and groups to follow the ACDD standard.
>>>>> Now you are breaking your own standard.
>>>>> The new ACDD group seems to think there are no consequences to 
>>>>> changing attribute names and that it can be done just to suit the 
>>>>> group's fancy.
>>>>> It doesn't matter if you or I think the new names are better. That 
>>>>> is not the issue.  If you are unhappy with the old system, change 
>>>>> the definitions to clarify the attribute's usage, don't change the 
>>>>> attribute names. Changes that break the old standard are wrong, 
>>>>> wrong, wrong.
>>>>> And no, saying that all attributes are optional doesn't make it 
>>>>> okay to change the attribute's names. If ACDD says that the data 
>>>>> creator's name is in an attribute called creator_name, then that 
>>>>> is where it should be (last year, this year, next year, and in 50 
>>>>> years).
>>>>>
>>>>> ---
>>>>> Standards should be backwards compatible.
>>>>> Standards should be as stable as possible.
>>>>> ACDD should be cleaning up the definitions of existing attributes 
>>>>> and sparingly adding new attributes that provide a place for new 
>>>>> pieces of information, NOT changing existing attribute names.
>>>>>
>>>>>
>>>>> -- 
>>>>> Sincerely,
>>>>>
>>>>> Bob Simons
>>>>>
>>>>
>>>>
>>>
>>> -- 
>>> Sincerely,
>>>
>>> Bob Simons
>>> IT Specialist
>>> Environmental Research Division
>>> NOAA Southwest Fisheries Science Center
>>> 1352 Lighthouse Ave
>>> Pacific Grove, CA 93950-2079
>>> Phone: (831)333-9878 (Changed 2014-08-20)
>>> Fax: (831)648-8440
>>> Email: bob.simons at noaa.gov
>>>
>>> The contents of this message are mine personally and
>>> do not necessarily reflect any position of the
>>> Government or the National Oceanic and Atmospheric
>>> Administration.
>>> <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
>>>
>>
>> -- 
>> Sincerely,
>>
>> Bob Simons
>> IT Specialist
>> Environmental Research Division
>> NOAA Southwest Fisheries Science Center
>> 1352 Lighthouse Ave
>> Pacific Grove, CA 93950-2079
>> Phone: (831)333-9878 (Changed 2014-08-20)
>> Fax: (831)648-8440
>> Email: bob.simons at noaa.gov
>>
>> The contents of this message are mine personally and
>> do not necessarily reflect any position of the
>> Government or the National Oceanic and Atmospheric
>> Administration.
>> <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
>>
>> _______________________________________________
>> Esip-documentation mailing list
>> Esip-documentation at lists.esipfed.org 
>> <mailto:Esip-documentation at lists.esipfed.org>
>> http://www.lists.esipfed.org/mailman/listinfo/esip-documentation
>

-- 
Sincerely,

Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
1352 Lighthouse Ave
Pacific Grove, CA 93950-2079
Phone: (831)333-9878 (Changed 2014-08-20)
Fax: (831)648-8440
Email: bob.simons at noaa.gov

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric
Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <>< <><



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20140929/460a9b9e/attachment-0001.html>


More information about the Esip-documentation mailing list