[Esip-documentation] New ACDD home page

John Graybeal graybeal at marinemetadata.org
Tue May 28 12:11:09 EDT 2013


Good feedback. Just some quick responses for clarificationpurposes:

> IMHO, the dates of importance, at least for observational data are 1. when the last 'data value'  was modified - the 'data version date' and 2. when the NetCDF was written - the 'file creation date'.  

That's what date_modified an date_created are meant to be. My definitions may need improvement to match. I am treating these dates as "what the public sees", not what the system is doing internally.

>  is there a use case for recording the date on which a data set was first released? 

I wasn't sure what 'released' meant. In many environments, in most environments, I'd argue it is the same as date_created. (When the user can see that the file/data set has been 'created', that means the system has 'released' the data to the user for the first time. QED. :->)  I couldn't come up with an additional use case that wasn't satisfied by this construction.  

But there was the use case for knowing that much, namely "When did this data set first come into existence?" (In order to identify how long it's been available, how mature it is, whether the system has published it since the first raw byte was available, etc.)  Not critical in my book, but a lot of specifications seem to think it is.

> Re: time_coverage_resolution - should we change the term to time_coverage_interval, or  similar, since 'resolution' has so many meanings?  And should we specifically recommend that it be expressed as an ISO 8601 interval string, e.g. P1H for one hour, or is it useful as free text? 


I thought resolution was parallel with the other two dimensions' 'resolutions', where 'interval' didn't make sense. But your call on that.  

I like ISO 8601 for this, personally.

> wonder if we could go with the (slightly less symmetrical) terms creator_name, creator_info, creator_institution, creator_institution_info - which assumes that an 'unmodified' creator is by default a person.

No objection, please consider 'creator_institution_name' as the third one, so as to be parallel.

The rest of your questions are good, and I don't have an opinion/answer on them at this point. I think any direction the group chooses could be suitable.

John


On May 28, 2013, at 04:59, Nan Galbraith <ngalbraith at whoi.edu> wrote:

> Hi John - 
> 
> Nice work!  I appreciate that you've provided definitions that are not circular - creator
> is defined as ' person principally responsible for originating this data', for example - much 
> more clear.
> 
> Some quibbles with file dates, which are now defined more clearly (so that I may disagree
> with them)
> date_created 
> The first date on which this dataset was published (this value never changes after 
> first set of data is released the first time).
> 
> date_modified 
> The date on which this dataset (as seen by users or captured in a file) was last 
> changed.
> 
> 
> I think model runs and observational data files may need different file time information,
> but I can only speak to the obs side. That said, is there a use case for recording the date 
> on which a data set was first released? 
> 
> IMHO, the dates of importance, at least for observational data are 1. when the last 'data value' 
> was modified - the 'data version date' and 2. when the NetCDF was written - the 'file creation 
> date'.  I need the former because it determines whether a file contains the latest version of
> the observations - post-cals applied, revised algorithms used, more de-spiking done, 
> whatever.  I need the latter because it lets me know if the file contains the latest metadata, 
> the latest conventions (for those that are evolving), and if it was written by a good netcdf 
> conversion run.
> 
> 'The first date on which this dataset was published' seems like it would require some definitions;
> actually I think it might be impossible to pin down for  any project where real time data - possibly 
> even pre-deployment data - is published, updated, and replaced  with post-recovery data.  What 
> does 'dataset' mean in this context?  What does 'published'  mean?  If I put it on my group's web 
> site, is it published? Is there some other definition for this, or more to the point, an idea of why
> this date might be required?
> 
> Re: time_coverage_resolution - should we change the term to time_coverage_interval, or 
> similar, since 'resolution' has so many meanings?  And should we specifically recommend 
> that it be expressed as an ISO 8601 interval string, e.g. P1H for one hour, or is it useful as
> free text? 
> 
> Re: creator and publisher fields -  I really like the way you've developed these, especially the way 
> you've expanded the publisher fields  - that gives a project like oceansites a place to be identified. 
> 
> Quibble: I can't say I like the term creator_person, and wonder if we could go with the (slightly
> less symmetrical) terms creator_name, creator_info, creator_institution, creator_institution_info -
> which assumes that an 'unmodified' creator is by default a person.
> 
> The definition of the _info fields, 'can include any information as ISO 19139 or free text' is 
> a little too vague, IMHO, in terms of guidance.
> 
> How should the '_info' information be presented in an ISO 19139 compliant way? Can we
> just choose some fields within CI_ResponsibleParty and list those, or are we thinking
> of an xml snippet for this attribute?  An example (from OGC) could be coded either as:
> 
> creator_info: 'organisationName:con terra GmbH, email:voges at conterra.de' ;
> 
> or as:
> 
> creator_info: '<contact>
>              <CI_ResponsibleParty>
>                 <individualName>
>                    <gco:CharacterString>Uwe Voges</gco:CharacterString>
>                 </individualName>
>                 <organisationName>
>                    <gco:CharacterString>con terra GmbH</gco:CharacterString>
>                 </organisationName>
>                 <contactInfo>
>                    <CI_Contact>
>                       <address>
>                         <CI_Address>
>                            <electronicMailAddress>
>                              <gco:CharacterString>voges at conterra.de</gco:CharacterString>
>                            </electronicMailAddress>
>                         </CI_Address>
>                      </address>
>                   </CI_Contact>
>                 </contactInfo>
>             </CI_ResponsibleParty>
>         </contact>' ;
> 
> Do we recommend one over the other?  Will a multi-line, verbose attribute like the
> latter be hard for users to implement? Does it add any functionality?
> 
> Thanks again -
> 
> Nan
> 
> On 5/20/13 9:54 PM, John Graybeal wrote:
>> Hi everyone, 
>> 
>> I talked with David N and Derrick S end of last week, and we agreed on some basic strategies, and today I finally updated all the definitions [1].
>> 
>> Please consider these new definitions -- and deletions [3], additions, and rearrangements into different categories -- food for discussion.  You might want to first decide whether to broadly accept the approach in each case, then nit-pick the definitions.
>> 
>> The approaches are documented in some detail in the discussion page of the Working document [2]. But extremely briefly:
>> - Neither totally computable, nor totally incomputable (but just right :->); encouraged use of structured text in many fields
>> - Somewhat flat, but somewhat structured: structured text in fields (optionally), but fewer fields with ancillary metadata
>> - Support a range of keyword styles, but avoid recommending any in particular; support multiple keyword and standard_name vocabularies
>> - Generally did not include guidance, as much for lack of time as anything. I think a third column with guidance/references would be most valuable.
>> - Reflected recommendations like http://wiki.esipfed.org/index.php/NetCDF,_HDF,_and_ISO_Metadata in terms of the skeleton, but considered them more guidance than something we could require at this stage. (Small steps.)
>> 
>> Other changes broadly described:
>> - Geospatiotemporal: Now much more explicit about the geospatiotemporal attributes
>> - Lineage: Now much more explicit about possible forms for this information.
>> 
>> As someone who works a lot with rich structured metadata, I like the flexibility this new approach gives to do that. Conversely, I don't think it shuts down any of the less formal providers/documenters of metadata. I'll be curious to see your inputs.
>> 
>> John
>> 
>> 
>> [1] http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_(ACDD)_Working
>> [2] http://wiki.esipfed.org/index.php/Talk:Attribute_Convention_for_Data_Discovery_(ACDD)_Working 
>> [3] We might want to move deletions into a deprecated section, so that they are allowed for backwards compatibility.
>> 
>> ======================================  (Past email thread, for reference)
>> 
>> In any case I will be tossing some additional specific text for each term on the working page (http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_(ACDD)_Working).  Please take that page into account as you go forward.
>> 
>> There is a conflict in short-term and longer-term strategies, which I'll summarize here.  One can either have a flat list of attributes, or a list that supports rich relationships (groupings and descriptions of them), or a hybrid that cobbles together a way to show rich relations in a flat list. This especially affects contacts and their roles.  I classify it as short-term vs long-term, because I'm sure someday we'll want to migrate to the richer relations approach (or at least a hybrid), but I understand that time may not be now.
>> 
>> The other connected issue that keeps popping up is use of controlled vocabularies, where you can either specify *one* vocabulary a priori for each field, or include a vocabulary field for every attribute that calls for CV terms, or allow the use of fully unique CV terms within any of these attributes. This is somewhat affected by whether your attribute list is flat or rich.
>> 
>> I will add my comments on these two topics to the discussion page, but they have a strong bearing on the best integration approach.
>> 
>> John
>> 
>> 
>> 
>> On May 9, 2013, at 09:07, David Neufeld - NOAA Affiliate <david.neufeld at noaa.gov> wrote:
>> 
>>> John,
>>> 
>>> Thanks for getting this started, I've added some information on governance.
>>> 
>>> Seems like a next step would be for Rich, Aleksandar, Ted and I to review the working draft and document some of the tweaks that crept into ncISO over the past year outside of ACDD.  Then we can discuss whether some (or all?) of those changes could be incorporated into the standard.
>>> 
>>> http://wiki.esipfed.org/index.php/Category:Attribute_Conventions_Dataset_Discovery
>>> 
>>> Dave
>>>   
> 
> -- 
> *******************************************************
> * Nan Galbraith        Information Systems Specialist *
> * Upper Ocean Processes Group            Mail Stop 29 *
> * Woods Hole Oceanographic Institution                *
> * Woods Hole, MA 02543                 (508) 289-2444 *
> *******************************************************
> 
> 


---------------
John Graybeal
Marine Metadata Interoperability Project: http://marinemetadata.org
graybeal at marinemetadata.org




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20130528/8708fc65/attachment-0001.html>


More information about the Esip-documentation mailing list