[Esip-documentation] status/open issues for ACDD approval

John Graybeal via Esip-documentation esip-documentation at lists.esipfed.org
Thu Aug 28 15:49:04 EDT 2014


Philip,

No worries about the late date, if we can make it noticeably better I don't think anyone will mind a small delay in finalizing. But push to wrap up at this next meeting if we can!

Regarding date_product_[generated|distributed|released] : I didn't care for 'distributed' because the same product can be distributed multiple times; and I didn't care for 'released' because that word often has a formal meaning (in opposition to unreleased). Anna and I came up with date_product_available -- how does that work for you? The definition, now with further clarifications, is 

> date_product_available : The date on which this individual data file or other product was made available (ISO 8601 format); corresponds to ISO 19115-2 CI:DateTypeCode of "publication". This can be the date the product was originally created in some systems; for others, it may be the date the product was (first) provided to a user. This means the availability date may be after the product was first created; therefore the date_content_modified and date_values_modified should be used to assess the age of the content. 

Let's pose the question to the group of whether date_product_generated adds value, for the purposes you identify (provenance tracking and managing additional or replacement files). I assume we are trying to assess this from the external user's perspective, and allow for file and web service protocols.  My take: Knowing when the file was created provides no inherent advantage to the user receiving that file unless he or she knows the mechanism by which the system creates files, and that the mechanism won't change. (Obviously the data system that creates and publishes the file could tie its provenance records to the file creation time, if it keeps data in files internally; but it could equally well tie it to the availability time, or a unique ID, or the provenance could be much more atomic than a whole data file.)  I'm not sure which use case you mean by 'managing additional or replacement files'; again from the user's perspective, I think all the use cases for that are addressed with the existing three attributes. Happy to work this through offline if that helps.

Regarding date_[content|values]_modified : The terms 'data' and 'metadata' are ambiguous in most contexts, including this one; I would not like those terms myself. Assuming we are trying to satisfy the primary use cases of "when did _anything_ change?" and "when did the values change?", maybe we can improve the first by replacing 'content' with 'product': date_product_modified. 

I can't think of a better term than 'values'; 'variables' to many includes the variable attributes, which we are explicitly trying to exclude. Since the definition is the important thing, maybe we can choose from a list of possible name pairs at the next meeting?  What choices would you add to the following?
1)  date_content_modified, date_values_modified
2)  date_product_modified, date_values_modified
3)  date_data_modified, date_metadata_modified


John
  




On Aug 22, 2014, at 06:35, Philip Jones - NOAA Affiliate <philip.jones at noaa.gov> wrote:

> John, thanks for your responses.
> 
> If that is the intended meaning of date_product_generated, then I agree the attribute name should better reflect that meaning. If you want to avoid using the word "issued", then maybe use "released" or "published" in the name. For example, date_released or date_published. Because "distributed" can be understood as an ongoing activity, whereas "released" or "published" imply the initial distribution of a particular version.
> 
> I'm still not sure the definitions of date_content_modified and date_values_modified would be apparent to users of a data file. What about simply using date_data_modified and date_metadata_modified? 
> 
> The rationale for deprecating the date_created attribute on the ACDD page says:
> "date_created:deleted in favor of date_product_generated (which used to be date_issued); we did not have a use case for knowing the date a stream or product was _first_ generated, once it has been updated"
> Having the producer's initial create date of a file is important for provenance tracking and for managing additional or replacement files that may be created. Only using the *_modified date attributes creates a dependency on using the history attribute correctly with change details in order to determine the original create date of a file.
> 
> I apologize for the late comments and do not wish to delay plans for this ACDD version. I'll try to make the next group call.
>  
> Phil
> 
> 
> On Thu, Aug 21, 2014 at 4:16 PM, John Graybeal <jgraybeal at mindspring.com> wrote:
> Hi Philip, thanks for your input. Here are my thoughts, looking for feedback from you and the list.
> 
>> date_product_generated:  Is this attribute intended to hold the initial create date of the file?
> 
> No, it was meant to be when it was distributed (the separation you wanted). This corresponds originally to the ISO 19115-2 code /gmd:dateType/gmd:CI_DateTypeCode="publication", which says here "date identifies when the resource was issued". To me this means 'visibly released' to the external users, but some say 'issued' means 'produced' in the system). 
> 
> As I understand it, the ACDD attribute targets the use case "if I went to the site on date X, was it there yet?" This being helpful for people or computers who grab information from a site at every so often, to know what they don't have to grab. 
> 
> So I agree the word 'generated' is confusing here; I can't find a discussion where it changed from _issued to _generated, but I think it was an attempt to avoid the ambiguity of the ISO term 'issued'. 
> 
> Perhaps this is better:
> 
> date_product_distributed: The date on which this individual data file or other product was distributed (ISO 8601 format). This may be after the product was created (but not before); therefore the date_content_modified and date_values_modified should be used to assess the age of the content. 
> 
> (I wanted to add "If the identical data file or product is distributed multiple times, this should be the first date of distribution." But it is pretty wordy already.)
> 
>> date_content_modified, date_values_modified
> 
>> Both definitions mention changes to the "data", which I presume means changes to variables in the file. Can the definitions and maybe the attribute names be clarified so that the differences between them are clear? Suggest using terminology from the netCDF data model. 
> 
> Well, that might be more precise, if we can agree. I'm a little nervous proposing a change, but let's see what people say about just changing 'data' to 'variables' and 'metadata' to 'attributes':
> 
> date_content_modified:  The date on which any of the provided content, including variables, attributes, and presented format, was last created or changed (ISO 8601 format) 
> 
> date_values_modified: The date on which the provided variables' data values were last created or changed; excludes attributes and formatting changes (ISO 8601 format) 
> 
>> can you add the original version 1 from 2005 to the wiki
> 
> Good suggestion. As discussed on the call, we'll add this.
> 
> John
> 
> 
> 
> On Aug 21, 2014, at 10:58, Philip Jones - NOAA Affiliate via Esip-documentation <esip-documentation at lists.esipfed.org> wrote:
> 
>> John, all,
>> 
>> I have a few late comments/questions on the date attributes.
>> 
>> Attribute: 
>> date_product_generated 
>>     The date on which this data file or product was produced/distributed (ISO 8601 format). While this date is like a file timestamp, the date_content_modified and date_values_modified should be used to assess the age of the contents of the file or product. 
>> 
>> Comment: 
>> The date-time a file was "produced" (generated) is not the same as when it was "distributed", because not all datasets are distributed in real-time. Many datasets are produced/generated weeks prior to their distribution. I recommend separating produced from distributed, which suggests that date_issued is still relevant. Is this attribute intended to hold the initial create date of the file?
>> 
>> Attributes: 
>> date_content_modified 
>>     The date on which any of the provided content, including data, metadata, and presented format, was last created or changed (ISO 8601 format) 
>> date_values_modified
>>     The date on which the provided data values were last created or changed; excludes metadata and formatting changes (ISO 8601 format) 
>> 
>> Comment: 
>> Both definitions mention changes to the "data", which I presume means changes to variables in the file. Can the definitions and maybe the attribute names be clarified so that the differences between them are clear? Suggest using terminology from the netCDF data model. 
>> 
>> Also, if this group is maitaining a history of all ACDD versions, can you add the original version 1 from 2005 to the wiki? It is no longer hosted at Unidata. Archive link from April 2014: https://web.archive.org/web/20140424133239/http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html
>> 
>> Phil
>> 
>> 
>> On Tue, Aug 19, 2014 at 7:57 PM, John Graybeal via Esip-documentation <esip-documentation at lists.esipfed.org> wrote:
>> Hi all,
>> 
>> In case we get time to consider ACDD Thursday, here are the issues I've seen on the discussion thread and their current status. The page at http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_1-2_Working contains the recommended changes. 
>> 
>> If there are no further concerns raised, I'd like to do preliminary approval/consent at this call, and schedule the final approval for next call. (If there are still concerns, we can discuss them on-list or on the call.) The approved document can become Version 2, since several people have started calling it that; then everyone is free to work on a groups-aware revision, as they see fit.
>> 
>> A brief reminder: With respect to issues (1) and (2), because ACDD attributes are all recommendations -- there are no 'shall' statements in the document -- people are still within the specification while not using whatever attributes they don't like. So it isn't dysfunctional if there are attributes that some choose to omit, or deprecated terms that some choose to use.
>> 
>> === Open Topics ===-
>> 
>> 1) Deprecation of date_* attributes
>> 
>> This related to the deprecation of
>>     date_created, date_issued, data_modified 
>> attributes, while adding (not 1 for 1) 
>>     date_content_modified, date_values_modified, date_product_generated.
>> 
>> This topic was previously summarized in email; review that summary on the talk page[1]. If there continue to be concerns, we can vote on the best answer..
>> 
>> 2) Adoption of summary metadata for geospatiotemporal ranges (good, tolerable, or bad?)
>>     
>> Extensive discussion led to an explicit section addressing key software principles[2], and some warning text.  I have not received any critical comments since the last round of changes. (I think one critic is satisfied, another perhaps just silent. :->)  If concerns remain, we can discuss and vote.
>> 
>> 3) Organization of ACDD pages 
>> 
>> There is a bit of confusion still with the current organization. I hesitated to go wild with fixes myself, but now that I'm co-chair with Anna, I think we can just fix issues as they are identified. If you have an issue with the ACDD organization, can you please send it to the list or us, as you prefer?  With approval a lot will become more transparent.
>> 
>> John
>> 
>> 
>> 
>> 
>> 
>> 
>> [1] Summary of date_* attribute concerns: http://wiki.esipfed.org/index.php/Talk:Attribute_Convention_for_Data_Discovery_1-2_Working#Attributes_Discussed_and_Resolved
>> 
>> [2] Spatial and Temporal bounds summary recommendations: 
>> http://wiki.esipfed.org/index.php/Talk:Attribute_Convention_for_Data_Discovery_1-2_Working#Spatial_and_Temporal_Bounds
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Esip-documentation mailing list
>> Esip-documentation at lists.esipfed.org
>> http://www.lists.esipfed.org/mailman/listinfo/esip-documentation
>> 
>> 
>> 
>> 
>> -- 
>> Philip R. Jones
>> Team ERT/STG
>> Government Contractor
>> National Climatic Data Center, NOAA NESDIS
>> Veach-Baley Federal Building
>> 151 Patton Ave.
>> Asheville, NC 28801-5001 USA
>> Voice: +1 828-271-4472  FAX: +1 828-271-4328
>> _______________________________________________
>> Esip-documentation mailing list
>> Esip-documentation at lists.esipfed.org
>> http://www.lists.esipfed.org/mailman/listinfo/esip-documentation
> 
> 
> 
> 
> -- 
> Philip R. Jones
> Team ERT/STG
> Government Contractor
> National Climatic Data Center, NOAA NESDIS
> Veach-Baley Federal Building
> 151 Patton Ave.
> Asheville, NC 28801-5001 USA
> Voice: +1 828-271-4472  FAX: +1 828-271-4328

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20140828/90134bf9/attachment-0001.html>


More information about the Esip-documentation mailing list