[Esip-documentation] status/open issues for ACDD approval

John Graybeal via Esip-documentation esip-documentation at lists.esipfed.org
Fri Sep 19 19:28:28 EDT 2014


Phil, all,

First of all, I don't have big heartburn with a new term called date_product_originally_created, if we need it. (Term to be argued about, maybe.)

But here's the crux of the challenge as I see it:
> I don't have a strong opinion on the new modified date attributes other than the intended attribute's meaning should be obvious to users based on the attribute's name.

From long work with vocabularies, and specific work with this vocabulary, I consider this goal worth striving for but never fully achievable.  In the specific case of these original names 2-4 that you cite, it was clear on the call about 3 months back that everyone was sure about what these names meant -- while disagreeing about what they did mean. For a simple example:
> 2) date_created (Recommended): date the dataset was created/completed

Which is this? The date the dataset was first created, or the time it was last created? Similar questions always come up when discussing those terms.

(The reasons for the debate seem obvious: we build different systems for different purposes. Some are file-based storage, some are databases; some are archival, others real-time; some transfer data using files, others use protocols which may or may not be backed by files; and for new data, some rewrite files, others create all new files, and still others send out messages in streams and don't create files or transfers until a request comes in. Simple terms aren't so simple across dozens of systems.)

So the choices to fix ACDD were:
- redefine existing terms (which makes us agree on a definition, but doesn't fix the name-definition disconnect and breaks some past usage); OR
- come up with new definitions that are narrowly defined around the use cases, then try to pick terms to match.
We could leave things alone but that would not fix the issue.

So we tried to do the second; with new terms and more detailed definitions, we can all start from the same place. I think our definitions satisfied your use cases already, but maybe we still have to add a new term like date_product_originally_created. 

I request that everyone look at the proposed definitions for date_product_modified, date_values_modified, and date_product_available. If they aren't clear or sufficient, please explain why and/or suggest an improvement _to the definition_. If we can agree on the definition, the name can be settled; without the definition, I am sure we will have big problems.

John


On Sep 19, 2014, at 08:53, Philip Jones - NOAA Affiliate via Esip-documentation <esip-documentation at lists.esipfed.org> wrote:

> Nan,
> 
> Thanks for sharing your use cases. I can see where it might be difficult or impossible to know the date when your data will become publicly available. However, even if an attribute is not applicable for your data, it should not be deprecated if it is applicable to other relevant data. Any deprecation should be done cautiously and with good justification. 
> 
> Also, note that the ACDD versions 1.0-1.3 already support the concept of data publication with the Recommended set of attributes "publisher" and "publisher_email". So if data can be published then it can have a publication date. The same can be said for the "creator_*" attributes and the create date.
> 
> This is the way I have interpreted and used the version 1.0 date-related attributes for a given dataset version*:
> 1) time_coverage_* (Recommended): time(s) of data coverage and resolution (not in question at this time)
> 2) date_created (Recommended): date the dataset was created/completed
> 3) date_issued (Suggested): date the dataset was first made available to the public
> 4) date_modified (Suggested) and history (Recommended): used in combination to document modifications to the original data
> *If you have another version of the data, then it should have new values for the above attributes. I don't think we need two sets of date attributes for observational data and derived products/models. 
> 
> My main preference going forward is to have/keep attributes that support the notion of an original create date and publication date. The create date should probably be Recommended and the publication date Suggested, either way they're not mandatory but exist to continue using as needed. I don't have a strong opinion on the new modified date attributes other than the intended attribute's meaning should be obvious to users based on the attribute's name.
> 
> In case you missed yesterday's meeting, John will be sending a review comment spreadsheet for version 1.3 and having a review discussion on 10/2. 
> 
> Phil
> 
> On Fri, Sep 19, 2014 at 10:03 AM, Nan Galbraith <ngalbraith at whoi.edu> wrote:
> Hi Phil and all -
> 
> For files containing observational data, the descriptors 'released', 'published' 
> and 'issued' are very nearly meaningless terms. 
> 
> My NetCDF data files each have a single 'version date' that reflects the last time 
> the observations or derived data values were changed by editing, or applying new 
> calibrations or new algorithms, and a 'file date' that includes the above plus any 
> re-write that was due to changes in some format spec (like ACDD) or maybe a 
> semantic error (such as a missed standard name). 
> 
> We have no concept of a date on which something was originally available, 
> either - assorted subsets of our surface mooring data are released in real time, 
> other parts take years to be published.
> 
> Originally, I tried to use the terms you've listed, which were in Ethan's first cut 
> of ACDD, but I had to make up definitions for them - because they really have
> no inherent meaning for my data. I think this is true of most observational and/or
> in situ data. 
> 
> Do we need 2 sets of 'file date' terms, one for products/models and one for
> observational data sets? 
> 
> Regards- 
> Nan Galbraith
> 
> 
> On 9/18/14 1:56 PM, Philip Jones - NOAA Affiliate via Esip-documentation wrote:
>> John, 
>> 
>> Thanks for considering my comments.
>> 
>> I don't see why the descriptors "released", "published' or "issued" are a problem for describing the data publication date. Most products have an associated version number and by definition this attribute would be the date that version was released. (Maybe we need to add an attribute for product version number.) I have two concerns with the current proposal, date_product_available. 1) Attributes should not have a compound purpose, e.g., date the product was originally created or made available. These can be two distinct dates. 2) The meaning of these attributes must be obvious to users by the attribute name alone, and I'm not sure the intended meaning of "available" would be obvious. Note that users of these netCDF files will not refer to the ACDD pages to look up the attribute definitions. The netCDF creators possibly will, but not users. 
>> 
>> I very much suggest we keep an attribute that supports the original create date. Just about every metadata standard for science data to documents, images and video, supports the concept of date created and date modified. What do we gain by removing it from ACDD?
>> 
>> I can discuss it more in our meeting.
>> 
>> Phil
>> 
>> On Thu, Aug 28, 2014 at 3:49 PM, John Graybeal <jgraybeal at mindspring.com> wrote:
>> Philip,
>> 
>> No worries about the late date, if we can make it noticeably better I don't think anyone will mind a small delay in finalizing. But push to wrap up at this next meeting if we can!
>> 
>> Regarding date_product_[generated|distributed|released] : I didn't care for 'distributed' because the same product can be distributed multiple times; and I didn't care for 'released' because that word often has a formal meaning (in opposition to unreleased). Anna and I came up with date_product_available -- how does that work for you? The definition, now with further clarifications, is 
>> 
>>> date_product_available : The date on which this individual data file or other product was made available (ISO 8601 format); corresponds to ISO 19115-2 CI:DateTypeCode of "publication". This can be the date the product was originally created in some systems; for others, it may be the date the product was (first) provided to a user. This means the availability date may be after the product was first created; therefore the date_content_modified and date_values_modified should be used to assess the age of the content. 
>> 
>> Let's pose the question to the group of whether date_product_generated adds value, for the purposes you identify (provenance tracking and managing additional or replacement files). I assume we are trying to assess this from the external user's perspective, and allow for file and web service protocols.  My take: Knowing when the file was created provides no inherent advantage to the user receiving that file unless he or she knows the mechanism by which the system creates files, and that the mechanism won't change. (Obviously the data system that creates and publishes the file could tie its provenance records to the file creation time, if it keeps data in files internally; but it could equally well tie it to the availability time, or a unique ID, or the provenance could be much more atomic than a whole data file.)  I'm not sure which use case you mean by 'managing additional or replacement files'; again from the user's perspective, I think all the use cases for that are addressed with the existing three attributes. Happy to work this through offline if that helps.
>> 
>> Regarding date_[content|values]_modified : The terms 'data' and 'metadata' are ambiguous in most contexts, including this one; I would not like those terms myself. Assuming we are trying to satisfy the primary use cases of "when did _anything_ change?" and "when did the values change?", maybe we can improve the first by replacing 'content' with 'product': date_product_modified. 
>> 
>> I can't think of a better term than 'values'; 'variables' to many includes the variable attributes, which we are explicitly trying to exclude. Since the definition is the important thing, maybe we can choose from a list of possible name pairs at the next meeting?  What choices would you add to the following?
>> 1)  date_content_modified, date_values_modified
>> 2)  date_product_modified, date_values_modified
>> 3)  date_data_modified, date_metadata_modified
>> 
>> 
>> John
>>   
>> 
>> 
>> 
>> 
>> On Aug 22, 2014, at 06:35, Philip Jones - NOAA Affiliate <philip.jones at noaa.gov> wrote:
>> 
>>> John, thanks for your responses.
>>> 
>>> If that is the intended meaning of date_product_generated, then I agree the attribute name should better reflect that meaning. If you want to avoid using the word "issued", then maybe use "released" or "published" in the name. For example, date_released or date_published. Because "distributed" can be understood as an ongoing activity, whereas "released" or "published" imply the initial distribution of a particular version.
>>> 
>>> I'm still not sure the definitions of date_content_modified and date_values_modified would be apparent to users of a data file. What about simply using date_data_modified and date_metadata_modified? 
>>> 
>>> The rationale for deprecating the date_created attribute on the ACDD page says:
>>> "date_created:deleted in favor of date_product_generated (which used to be date_issued); we did not have a use case for knowing the date a stream or product was _first_ generated, once it has been updated"
>>> Having the producer's initial create date of a file is important for provenance tracking and for managing additional or replacement files that may be created. Only using the *_modified date attributes creates a dependency on using the history attribute correctly with change details in order to determine the original create date of a file.
>>> 
>>> I apologize for the late comments and do not wish to delay plans for this ACDD version. I'll try to make the next group call.
>>>  
>>> Phil
>>> 
>>> 
>>> On Thu, Aug 21, 2014 at 4:16 PM, John Graybeal <jgraybeal at mindspring.com> wrote:
>>> Hi Philip, thanks for your input. Here are my thoughts, looking for feedback from you and the list.
>>> 
>>>> date_product_generated:  Is this attribute intended to hold the initial create date of the file?
>>> 
>>> No, it was meant to be when it was distributed (the separation you wanted). This corresponds originally to the ISO 19115-2 code /gmd:dateType/gmd:CI_DateTypeCode="publication", which says here "date identifies when the resource was issued". To me this means 'visibly released' to the external users, but some say 'issued' means 'produced' in the system). 
>>> 
>>> As I understand it, the ACDD attribute targets the use case "if I went to the site on date X, was it there yet?" This being helpful for people or computers who grab information from a site at every so often, to know what they don't have to grab. 
>>> 
>>> So I agree the word 'generated' is confusing here; I can't find a discussion where it changed from _issued to _generated, but I think it was an attempt to avoid the ambiguity of the ISO term 'issued'. 
>>> 
>>> Perhaps this is better:
>>> 
>>> date_product_distributed: The date on which this individual data file or other product was distributed (ISO 8601 format). This may be after the product was created (but not before); therefore the date_content_modified and date_values_modified should be used to assess the age of the content. 
>>> 
>>> (I wanted to add "If the identical data file or product is distributed multiple times, this should be the first date of distribution." But it is pretty wordy already.)
>>> 
>>>> date_content_modified, date_values_modified
>>> 
>>>> Both definitions mention changes to the "data", which I presume means changes to variables in the file. Can the definitions and maybe the attribute names be clarified so that the differences between them are clear? Suggest using terminology from the netCDF data model. 
>>> 
>>> Well, that might be more precise, if we can agree. I'm a little nervous proposing a change, but let's see what people say about just changing 'data' to 'variables' and 'metadata' to 'attributes':
>>> 
>>> date_content_modified:  The date on which any of the provided content, including variables, attributes, and presented format, was last created or changed (ISO 8601 format) 
>>> 
>>> date_values_modified: The date on which the provided variables' data values were last created or changed; excludes attributes and formatting changes (ISO 8601 format) 
>>> 
>>>> can you add the original version 1 from 2005 to the wiki
>>> 
>>> Good suggestion. As discussed on the call, we'll add this.
>>> 
>>> John
>>> 
>>> 
>>> 
>>> On Aug 21, 2014, at 10:58, Philip Jones - NOAA Affiliate via Esip-documentation <esip-documentation at lists.esipfed.org> wrote:
>>> 
>>>> John, all,
>>>> 
>>>> I have a few late comments/questions on the date attributes.
>>>> 
>>>> Attribute: 
>>>> date_product_generated 
>>>>     The date on which this data file or product was produced/distributed (ISO 8601 format). While this date is like a file timestamp, the date_content_modified and date_values_modified should be used to assess the age of the contents of the file or product. 
>>>> 
>>>> Comment: 
>>>> The date-time a file was "produced" (generated) is not the same as when it was "distributed", because not all datasets are distributed in real-time. Many datasets are produced/generated weeks prior to their distribution. I recommend separating produced from distributed, which suggests that date_issued is still relevant. Is this attribute intended to hold the initial create date of the file?
>>>> 
>>>> Attributes: 
>>>> date_content_modified 
>>>>     The date on which any of the provided content, including data, metadata, and presented format, was last created or changed (ISO 8601 format) 
>>>> date_values_modified
>>>>     The date on which the provided data values were last created or changed; excludes metadata and formatting changes (ISO 8601 format) 
>>>> 
>>>> Comment: 
>>>> Both definitions mention changes to the "data", which I presume means changes to variables in the file. Can the definitions and maybe the attribute names be clarified so that the differences between them are clear? Suggest using terminology from the netCDF data model. 
>>>> 
>>>> Also, if this group is maitaining a history of all ACDD versions, can you add the original version 1 from 2005 to the wiki? It is no longer hosted at Unidata. Archive link from April 2014:https://web.archive.org/web/20140424133239/http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html
>>>> 
>>>> Phil
>>>> 
>>>> 
>>>> On Tue, Aug 19, 2014 at 7:57 PM, John Graybeal via Esip-documentation <esip-documentation at lists.esipfed.org> wrote:
>>>> Hi all,
>>>> 
>>>> In case we get time to consider ACDD Thursday, here are the issues I've seen on the discussion thread and their current status. The page at http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_1-2_Working contains the recommended changes. 
>>>> 
>>>> If there are no further concerns raised, I'd like to do preliminary approval/consent at this call, and schedule the final approval for next call. (If there are still concerns, we can discuss them on-list or on the call.) The approved document can become Version 2, since several people have started calling it that; then everyone is free to work on a groups-aware revision, as they see fit.
>>>> 
>>>> A brief reminder: With respect to issues (1) and (2), because ACDD attributes are all recommendations -- there are no 'shall' statements in the document -- people are still within the specification while not using whatever attributes they don't like. So it isn't dysfunctional if there are attributes that some choose to omit, or deprecated terms that some choose to use.
>>>> 
>>>> === Open Topics ===-
>>>> 
>>>> 1) Deprecation of date_* attributes
>>>> 
>>>> This related to the deprecation of
>>>>     date_created, date_issued, data_modified 
>>>> attributes, while adding (not 1 for 1) 
>>>>     date_content_modified, date_values_modified, date_product_generated.
>>>> 
>>>> This topic was previously summarized in email; review that summary on the talk page[1]. If there continue to be concerns, we can vote on the best answer..
>>>> 
>>>> 2) Adoption of summary metadata for geospatiotemporal ranges (good, tolerable, or bad?)
>>>>     
>>>> Extensive discussion led to an explicit section addressing key software principles[2], and some warning text.  I have not received any critical comments since the last round of changes. (I think one critic is satisfied, another perhaps just silent. :->)  If concerns remain, we can discuss and vote.
>>>> 
>>>> 3) Organization of ACDD pages 
>>>> 
>>>> There is a bit of confusion still with the current organization. I hesitated to go wild with fixes myself, but now that I'm co-chair with Anna, I think we can just fix issues as they are identified. If you have an issue with the ACDD organization, can you please send it to the list or us, as you prefer?  With approval a lot will become more transparent.
>>>> 
>>>> John
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> [1] Summary of date_* attribute concerns: http://wiki.esipfed.org/index.php/Talk:Attribute_Convention_for_Data_Discovery_1-2_Working#Attributes_Discussed_and_Resolved
>>>> 
>>>> [2] Spatial and Temporal bounds summary recommendations: 
>>>> http://wiki.esipfed.org/index.php/Talk:Attribute_Convention_for_Data_Discovery_1-2_Working#Spatial_and_Temporal_Bounds
>>>> 
>>>> 
> 
> -- 
> *******************************************************
> * Nan Galbraith        Information Systems Specialist *
> * Upper Ocean Processes Group            Mail Stop 29 *
> * Woods Hole Oceanographic Institution                *
> * Woods Hole, MA 02543                 (508) 289-2444 *
> *******************************************************
> 
> 
> 
> 
> 
> -- 
> Philip R. Jones
> Team ERT/STG
> Government Contractor
> National Climatic Data Center, NOAA NESDIS
> Veach-Baley Federal Building
> 151 Patton Ave.
> Asheville, NC 28801-5001 USA
> Voice: +1 828-271-4472  FAX: +1 828-271-4328
> _______________________________________________
> Esip-documentation mailing list
> Esip-documentation at lists.esipfed.org
> http://www.lists.esipfed.org/mailman/listinfo/esip-documentation

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20140919/ae1e9c69/attachment-0001.html>


More information about the Esip-documentation mailing list