[Esip-documentation] Fwd: Re: ACDD: creator, project, institution - 2

Ted Habermann via Esip-documentation esip-documentation at lists.esipfed.org
Mon Sep 29 13:52:32 EDT 2014


Rich et al.,

I agree that developers play an important role in this business, but they also tend to be the most recalcitrant members of the team because they have to actually make things work…

Of course, data providers and users are also really important.

My lesson learned here is that we need to have clear and well-articulated user needs as we move forward. I will definitely take that into account as we move to the OCDD… I have started to try to organize presentations about benefits of changes in ISO in this way (http://www.slideshare.net/tedhabermann/19115-questionsandanswers and http://www.slideshare.net/tedhabermann/19157-questionsandanswers).

Ted

[cid:3777702D-45F4-4250-BB1C-8AFBD78174C5]

On Sep 29, 2014, at 9:13 AM, Signell, Richard via Esip-documentation <esip-documentation at lists.esipfed.org<mailto:esip-documentation at lists.esipfed.org>> wrote:

Gang,

At first I was thinking that Bob's outrage was not productive (I could practically see the bulding veins on the side of his neck), but then again, it *did* cause me to read the entire e-mail.  ;-)

And on reflection, I think I agree with Bob on all these points.   It doesn't matter so much what the actual names are -- we just need to make sure that people understand how they are to be used.  So keeping the existing names,  enabling backwards compatibility, along with perhaps more documentation/examples of how they are to be used, seems sensible, and adding new unique names to 1.3 for content not covered in 1.0

-Rich

P.S. I think when client developers speak, we should listen to them very carefully -- after all, they are the folks who we are really building standards for -- and they will hopefully leverage those standards for the benefit of the end-user and public.


On Mon, Sep 29, 2014 at 10:58 AM, Bob Simons - NOAA Federal via Esip-documentation <esip-documentation at lists.esipfed.org<mailto:esip-documentation at lists.esipfed.org>> wrote:
My replies are interspersed.


On 2014-09-26 3:05 PM, John Graybeal wrote:
Bob,

I'm just answering these two most recent questions for the moment, to introduce to everyone the types of use cases many of us have had to deal with.

This is not meant to argue with your position -- in a little bit I'll offer an overall option to address that, once I have it fully laid out. So please hold off on restating your position in response, until I've had a chance to address directly. Thanks!


Isn't it clear that creator_name is for the data creator's name?

The question that comes up for a user is "what does 'data creator' mean for our situation?" See my use case below.

If you look at the ISO Translation Notes in ACDD 1.1, you can see the same issues popping up, where creator_name is mapped to 3 different roles: citation/citedResponsibleParty role=originator, point of contact, and metadata contact. This (one name for multiple meanings) wasn't working for our use cases.
creator_name should be mapped to originator. There is no need to change this attribute's name.
If you want a separate point of contact, then add data_contact_name and data_contact_email to ACDD 1.3.
If you want a separate metadata contact that isn't the publisher, then add metadata_contact_name and metadata_contact_email to ACDD 1.3.
Changing creator_name to creator does not solve the one-name-for-multiple-meanings problem. Adding other x_name and x_email does.


Isn't it clear that institution and project are the creator's institution and project?

No, it isn't. Many data sets are created by folks who don't have an institution or have multiple institutions, or are created by an institution (not a user) that has nothing to do with the actual final presentation or intellectual ownership of the data set.
Yes, it is. Seriously?!  Read and look at the definition in ACDD 1.0 at
http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html
It is very clear.
I don't care if people were confused. People will always be confused. Explain it to them and move on.

institution is clearly defined as the creator's institution.
If you want a separate publisher_institution, fine, add it.
Changing institution to creator_institution does not solve the problem of different institutions. Adding other x_institution does.

[Ah. Given all the examples, below, let me add one guideline to clarify the meaning of creator: I mean the creator of this data (to which the metadata is attached). If some data is significantly changed (e.g., beyond QA/QC, e.g., combined with other data) then it becomes a new product and so has a new creator.  The original creators are relegated to "history" (literally and figuratively) and/or external documentation. This isn't my idea. This is a common practice and common sense. If some PI at JPL creates a model combining data from several sources, he is the creator of the model data. Perhaps ACDD 1.3 just needs to state this.]


Here's a use case I had to deal with:
My answers are for my proposal:
creator_name, project and institution all apply to data creator and already with that definition in ACDD 1.0.

Add: (see terms added below)

Note: Okay. I'll play this game. But if you're just going to pick this apart because you envision the situation slightly different than me, don't bother.
My point is: All these situations can be handled by attribute names already in ACDD 1.0 plus several new attribute names for new items.  There is no need to deprecate ACDD 1.0 names and replace them just for the sake of replacing them.


1) Someone makes an observation with their custom instrument. Call her A. Her institution is Z.
creator_name=A
institution=Z
2) The data from the instrument is collected by an ocean observing system called B (without which the data could not be collected), and is published by that system. Let's call that the raw data. B doesn't have an institution (really, it is built by dozens of institutions).

creator_name=A or B:  (Personally, I vote for A, since B is the publisher, but that is certainly determined by mutual agreement between A and B ahead of time.)
publisher_name/url/email=from B
(If B is an Ocean Observing System with built by a big consortium, the OOS certainly has a name. Show me one that doesn't. More likely is that there are two casually cooperating groups -- fine, list them both.)
Maybe you need to pull in contributor_x from ACDD 1.0. It depends on the details.

3) A standard process in the ocean observing system (written by team C from institution X) takes the raw data and performs automated QC checks. Let's call that QC data.
Use processing_level from ACDD 1.0 ("A textual description of the processing (or quality control) level of the data.")
None of the proposals in ACDD 1.3 deal with this.
There is no need to deprecate attribute names from ACDD 1.0 to deal with this.

4) A second process in the ocean observing system (written by contributor D from institution W) takes the QC data, interpolates it, and combines it with other data to create a grid. Let's call that gridded data.
"combines with other data" means it is a new product. So,
creator_name=D
creator_url=(if there is an external web page about how this dataset was created)
institution=W
history=(information about how this dataset was created)


5) An individual E (from institution V) takes a subset of the gridded data and publishes that as part of their paper, also submitting it back to the observatory as a reprocessed data set.
This is pretty vague.
If he didn't reprocess it (and he's just saying he did), the observatory shouldn't accept it.
If he did reprocess it:
creator_name=E
institution=V
history= details the original source data and the processing steps he took.
publisher_name/email/url=from B

6) The ocean observatory publishes the newly submitted, reprocessed data set. Let's call that the curated reprocessed data.
This is pretty vague. It depends a lot on what "curated" means here. You say the published data is as E submitted it, so it sounds lightly curated. So:
The same answers as 5.  B is getting the same credit that a book company gets for helping an author publish his/her book.

7) A republishing system (like ERDDAP), call it F, out of institution U, takes the curated reprocessed data, and re-offers it in multiple formats. Let's call that the reformatted data.
Multiple formats are just different representations of the same data. All the data remains the same.
The ERDDAP administrator's name/address/email get added to the ISO metadata as the contact for the service (which doesn't replace or change the creator or the publisher or other information which is also in the ISO metadata).
ERDDAP doesn't do much (especially compared to the creator). It doesn't take much credit.


If I make a table for each of those products 1 through 7, what do you think the creator_name is for each? And is the institution always creator's institution?
Yes. institution is defined as the creator's institution.
I see that I add one guideline to the definition of creator: I mean the creator of this data (to which the metadata is attached). If some data is significantly changed (e.g., beyond QA/QC, e.g., combined with other data) then it becomes a new product and so gets a new creator.  The original creators are relegated to "history" (literally and figuratively) and/or external documentation. This isn't my idea. This is a common practice and common sense. If some PI at JPL creates a model combining data from several sources, he is the creator of the model data. Perhaps ACDD 1.3 just needs to state this.


For us, these answers were not obvious from the existing names or definitions. That's what motivated us to spend this long time trying to make improvements.
All of these situations seemed clear to me (although they are hypothetical so it is easy to pick nits based on different understanding of the situation). I deal these using ACDD 1.0 all of the time.



John


On Sep 26, 2014, at 12:50, Bob Simons - NOAA Federal via Esip-documentation <esip-documentation at lists.esipfed.org<mailto:esip-documentation at lists.esipfed.org>> wrote:

I should have included this formatted snippet from the original ACDD 1.0 at
http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html :


creator_name<http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html#creator_name_Attribute>
        The data creator's name, URL, and email. The "institution" attribute will be used if the "creator_name" attribute does not exist.
        metadata/creator/name

creator_url<http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html#creator_url_Attribute>
        metadata/creator/contact at url

creator_email<http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html#creator_email_Attribute>
        metadata/creator/contact at email
institution<http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html#institution_Attribute>
        metadata/creator/name
project
<http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html#project_Attribute>    The scientific project that produced the data.
        metadata/project

Isn't it clear that creator_name is for the data creator's name?
Isn't it clear that institution and project are the creator's institution and project?

Nan, if someone misread this and used creator_name, institution, or project for some person/group other than the creator, then that is their mistake.  There is no need to deprecate these attribute names and create new attribute names just because someone misread/misused these attributes. That is their mistake. Tell them so they can fix it.


On 2014-09-26 12:14 PM, Bob Simons - NOAA Federal wrote:

On 2014-09-26 11:47 AM, Nan Galbraith wrote:
Hi all -

If you're using an ACDD version number in your metadata, these
updates shouldn't cause any problems for you. If you eventually
decide that there's some value in the revised spec, you can adopt
the changes, otherwise your data & code should be just fine.
What about those other standards (e.g., IOOS Gliders) that have adopted ACDD 1.0 attribute names but in the future would like to add some of the new ACDD 1.3 attribute names)?


IMHO, changing the definitions of the terms in a standard is far
worse than changing the terms themselves. Creator_name was
originally defined as "data creator's name" - really no definition
at all. If we change that to add any meaning (one who originally
collected the data, or who created the data file?) we could make
metadata in existing data sets incorrect.
Only if someone really misread the 1.0 definitions.

1.0 says "data creator's name" not "data file creator's name". It is the person/group that created the data.
You're taking an odd reading of the definition or some groups misuse of the existing name and definition and saying that that justifies deprecating the attribute.

And the "institution" definition is grouped with creator, so it is clearly the data creator's institution.

And "project" is defined in ACDD 1.0 as "The scientific project that produced the data."   (Not, e.g., the group that published the data.) It is clearly not the publisher's project.

So there is no need to change the existing attribute names.


Other than that, though, I see your point, and sympathize with
your reluctance to change these terms.  I also agree that we may have
changed some terms without strictly needing to,
Exactly.
but in the case of
creator_project and creator_institution (vs project and institution) I
think that allows for documenting other projects and institutions -
e.g. the project/institution that processes, aggregates, and/or
distributes a dataset might want some visibility for their efforts. In
the original version, they got that only at the expense of the originator's
information.
Fine. Then leave project and institution as is for the creator, but add publisher_project and publisher_institution.


I've seen the effect of this many times, where data collected by a PI in
my group appears on a portal with only the name and institution of
the last person to handle it. Or, when I send my data to OceanSITES for
distribution,  I'd like the OceanSITES project to be part of the metadata,
but not to remove the original information about the person and project
that collected and originally provided the data. Having creator_project
and _institution as named fields makes this information more likely to
be preserved as it should be.
I understand. Leaving project and institution as is for the creator, and adding publisher_project and publisher_institution offers a viable solution.


Regards -
Nan


On 9/26/14 9:46 AM, Nancy Ritchey - NOAA Federal via Esip-documentation wrote:
Bob,
Well said!  I agree with your assessment.  We've spent many years working with our providers to use these standards appropriately allowing the use of common tools across multiple platforms and communities.  Changing the standard as proposed will have many unintentional consequences that may negate its future use.  A thoughtful, practical solution is needed.
Nancy Ritchey

---------- Forwarded message ----------
From: *Bob Simons - NOAA Federal via Esip-documentation*<esip-documentation at lists.esipfed.org<mailto:esip-documentation at lists.esipfed.org> <mailto:esip-documentation at lists.esipfed.org><mailto:esip-documentation at lists.esipfed.org>>
Date: Thu, Sep 25, 2014 at 7:03 PM
Subject: [Esip-documentation] ACDD: creator, project, institution
To: John Graybeal <john.graybeal at marinexplore.com<mailto:john.graybeal at marinexplore.com> <mailto:john.graybeal at marinexplore.com><mailto:john.graybeal at marinexplore.com>>, ESIP Documentation <esip-documentation at lists.esipfed.org<mailto:esip-documentation at lists.esipfed.org> <mailto:esip-documentation at lists.esipfed.org><mailto:esip-documentation at lists.esipfed.org>>


I'm sure I'm coming late to this discussion:

Why does ACDD 1.3 have creator, not creator_name, like 1.0?
Why does ACDD 1.3 have creator_project, not project, like 1.0?
Why does ACDD 1.3 have creator_institution, not institution (which is in CF!), like 1.0?
If you want to add creator_institution_info, why not just add institution_info?

It seems like these changes are just to change to names that the new ACDD group prefers, but at a HUGE cost.
I have 1000's of datasets that have creator_name, project, and institution attributes.
I have written software, ERDDAP, that strongly recommends creator_name and requires institution.
I have told numerous people and groups to follow the ACDD standard.
Now you are breaking your own standard.
The new ACDD group seems to think there are no consequences to changing attribute names and that it can be done just to suit the group's fancy.
It doesn't matter if you or I think the new names are better. That is not the issue.  If you are unhappy with the old system, change the definitions to clarify the attribute's usage, don't change the attribute names. Changes that break the old standard are wrong, wrong, wrong.
And no, saying that all attributes are optional doesn't make it okay to change the attribute's names. If ACDD says that the data creator's name is in an attribute called creator_name, then that is where it should be (last year, this year, next year, and in 50 years).

---
Standards should be backwards compatible.
Standards should be as stable as possible.
ACDD should be cleaning up the definitions of existing attributes and sparingly adding new attributes that provide a place for new pieces of information, NOT changing existing attribute names.


--
Sincerely,

Bob Simons




--
Sincerely,

Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
1352 Lighthouse Ave
Pacific Grove, CA 93950-2079
Phone: (831)333-9878 (Changed 2014-08-20)
Fax: (831)648-8440
Email: bob.simons at noaa.gov<mailto:bob.simons at noaa.gov>

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric
Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <>< <><

--
Sincerely,

Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
1352 Lighthouse Ave
Pacific Grove, CA 93950-2079
Phone: (831)333-9878 (Changed 2014-08-20)
Fax: (831)648-8440
Email: bob.simons at noaa.gov<mailto:bob.simons at noaa.gov>

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric
Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <>< <><

_______________________________________________
Esip-documentation mailing list
Esip-documentation at lists.esipfed.org<mailto:Esip-documentation at lists.esipfed.org>
http://www.lists.esipfed.org/mailman/listinfo/esip-documentation


--
Sincerely,

Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
1352 Lighthouse Ave
Pacific Grove, CA 93950-2079
Phone: (831)333-9878 (Changed 2014-08-20)
Fax: (831)648-8440
Email: bob.simons at noaa.gov<mailto:bob.simons at noaa.gov>

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric
Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <>< <><



_______________________________________________
Esip-documentation mailing list
Esip-documentation at lists.esipfed.org<mailto:Esip-documentation at lists.esipfed.org>
http://www.lists.esipfed.org/mailman/listinfo/esip-documentation




--
Dr. Richard P. Signell   (508) 457-2229
USGS, 384 Woods Hole Rd.
Woods Hole, MA 02543-1598
_______________________________________________
Esip-documentation mailing list
Esip-documentation at lists.esipfed.org<mailto:Esip-documentation at lists.esipfed.org>
http://www.lists.esipfed.org/mailman/listinfo/esip-documentation

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20140929/9370a4a2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SignatureSm2.png
Type: image/png
Size: 16655 bytes
Desc: SignatureSm2.png
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20140929/9370a4a2/attachment-0001.png>


More information about the Esip-documentation mailing list