[Esip-documentation] status/open issues for ACDD approval

Philip Jones - NOAA Affiliate via Esip-documentation esip-documentation at lists.esipfed.org
Fri Sep 19 11:53:25 EDT 2014


Nan,

Thanks for sharing your use cases. I can see where it might be difficult or
impossible to know the date when your data will become publicly available.
However, even if an attribute is not applicable for your data, it should
not be deprecated if it is applicable to other relevant data. Any
deprecation should be done cautiously and with good justification.

Also, note that the ACDD versions 1.0-1.3 already support the concept of
data publication with the Recommended set of attributes "publisher" and
"publisher_email". So if data can be published then it can have a
publication date. The same can be said for the "creator_*" attributes and
the create date.

This is the way I have interpreted and used the version 1.0 date-related
attributes for a given dataset version*:
1) time_coverage_* (Recommended): time(s) of data coverage and resolution
(not in question at this time)
2) date_created (Recommended): date the dataset was created/completed
3) date_issued (Suggested): date the dataset was first made available to
the public
4) date_modified (Suggested) and history (Recommended): used in combination
to document modifications to the original data
*If you have another version of the data, then it should have new values
for the above attributes. I don't think we need two sets of date attributes
for observational data and derived products/models.

My main preference going forward is to have/keep attributes that support
the notion of an original create date and publication date. The create date
should probably be Recommended and the publication date Suggested, either
way they're not mandatory but exist to continue using as needed. I don't
have a strong opinion on the new modified date attributes other than the
intended attribute's meaning should be obvious to users based on the
attribute's name.

In case you missed yesterday's meeting, John will be sending a review
comment spreadsheet for version 1.3 and having a review discussion on 10/2.

Phil

On Fri, Sep 19, 2014 at 10:03 AM, Nan Galbraith <ngalbraith at whoi.edu> wrote:

>  Hi Phil and all -
>
> For files containing observational data, the descriptors 'released',
> 'published'
> and 'issued' are very nearly meaningless terms.
>
> My NetCDF data files each have a single 'version date' that reflects the
> last time
> the observations or derived data values were changed by editing, or
> applying new
> calibrations or new algorithms, and a 'file date' that includes the above
> plus any
> re-write that was due to changes in some format spec (like ACDD) or maybe
> a
> semantic error (such as a missed standard name).
>
> We have no concept of a date on which something was originally available,
> either - assorted subsets of our surface mooring data are released in real
> time,
> other parts take years to be published.
>
> Originally, I tried to use the terms you've listed, which were in Ethan's
> first cut
> of ACDD, but I had to make up definitions for them - because they really
> have
> no inherent meaning for my data. I think this is true of most
> observational and/or
> in situ data.
>
> Do we need 2 sets of 'file date' terms, one for products/models and one for
> observational data sets?
>
> Regards-
> Nan Galbraith
>
>
> On 9/18/14 1:56 PM, Philip Jones - NOAA Affiliate via Esip-documentation
> wrote:
>
>  John,
>
> Thanks for considering my comments.
>
> I don't see why the descriptors "released", "published' or "issued" are a
> problem for describing the data publication date. Most products have an
> associated version number and by definition this attribute would be the
> date that version was released. (Maybe we need to add an attribute for
> product version number.) I have two concerns with the current proposal,
> date_product_available. 1) Attributes should not have a compound purpose,
> e.g., date the product was originally created or made available. These can
> be two distinct dates. 2) The meaning of these attributes must be obvious
> to users by the attribute name alone, and I'm not sure the intended meaning
> of "available" would be obvious. Note that users of these netCDF files will
> not refer to the ACDD pages to look up the attribute definitions. The
> netCDF creators possibly will, but not users.
>
> I very much suggest we keep an attribute that supports the original create
> date. Just about every metadata standard for science data to documents,
> images and video, supports the concept of date created and date modified.
> What do we gain by removing it from ACDD?
>
> I can discuss it more in our meeting.
>
>  Phil
>
> On Thu, Aug 28, 2014 at 3:49 PM, John Graybeal <jgraybeal at mindspring.com>
> wrote:
>
>> Philip,
>>
>>  No worries about the late date, if we can make it noticeably better I
>> don't think anyone will mind a small delay in finalizing. But push to wrap
>> up at this next meeting if we can!
>>
>>  *Regarding date_product_[generated|distributed|released] *: I didn't
>> care for 'distributed' because the same product can be distributed multiple
>> times; and I didn't care for 'released' because that word often has a
>> formal meaning (in opposition to unreleased). Anna and I came up with
>> *date_product_available* -- how does that work for you? The definition,
>> now with further clarifications, is
>>
>>  *date_product_available* : The date on which this individual data file
>> or other product was made available (ISO 8601 format); corresponds to ISO
>> 19115-2 CI:DateTypeCode of "publication". This can be the date the product
>> was originally created in some systems; for others, it may be the date the
>> product was (first) provided to a user. This means the availability date
>> may be after the product was first created; therefore the
>> date_content_modified and date_values_modified should be used to assess the
>> age of the content.
>>
>>
>>  Let's pose the question to the group of whether *date_product_generated*
>> adds value, for the purposes you identify (provenance tracking and managing
>> additional or replacement files). I assume we are trying to assess this
>> from the external user's perspective, and allow for file and web service
>> protocols.  My take: Knowing when the file was created provides no inherent
>> advantage to the user receiving that file unless he or she knows the
>> mechanism by which the system creates files, and that the mechanism won't
>> change. (Obviously the data system that creates and publishes the file
>> could tie its provenance records to the file creation time, if it keeps
>> data in files internally; but it could equally well tie it to the
>> availability time, or a unique ID, or the provenance could be much more
>> atomic than a whole data file.)  I'm not sure which use case you mean by
>> 'managing additional or replacement files'; again from the user's
>> perspective, I think all the use cases for that are addressed with the
>> existing three attributes. Happy to work this through offline if that helps.
>>
>>  *Regarding date_[content|values]_modified* : The terms 'data' and
>> 'metadata' are ambiguous in most contexts, including this one; I would not
>> like those terms myself. Assuming we are trying to satisfy the primary use
>> cases of "when did _anything_ change?" and "when did the values change?",
>> maybe we can improve the first by replacing 'content' with 'product':
>> *date_product_modified*.
>>
>>  I can't think of a better term than 'values'; 'variables' to many
>> includes the variable attributes, which we are explicitly trying to
>> exclude. Since the definition is the important thing, maybe we can choose
>> from a list of possible name pairs at the next meeting?  What choices would
>> you add to the following?
>> 1)  date_content_modified, date_values_modified
>> 2)  date_product_modified, date_values_modified
>> 3)  date_data_modified, date_metadata_modified
>>
>>
>>  John
>>
>>
>>
>>
>>
>>  On Aug 22, 2014, at 06:35, Philip Jones - NOAA Affiliate <
>> philip.jones at noaa.gov> wrote:
>>
>>  John, thanks for your responses.
>>
>> If that is the intended meaning of date_product_generated, then I agree
>> the attribute name should better reflect that meaning. If you want to avoid
>> using the word "issued", then maybe use "released" or "published" in the
>> name. For example, date_released or date_published. Because "distributed"
>> can be understood as an ongoing activity, whereas "released" or "published"
>> imply the initial distribution of a particular version.
>>
>> I'm still not sure the definitions of date_content_modified and
>> date_values_modified would be apparent to users of a data file. What about
>> simply using date_data_modified and date_metadata_modified?
>>
>> The rationale for deprecating the date_created attribute on the ACDD page
>> says:
>> "date_created:deleted in favor of date_product_generated (which used to
>> be date_issued); we did not have a use case for knowing the date a stream
>> or product was _first_ generated, once it has been updated"
>> Having the producer's initial create date of a file is important for
>> provenance tracking and for managing additional or replacement files that
>> may be created. Only using the *_modified date attributes creates a
>> dependency on using the history attribute correctly with change details in
>> order to determine the original create date of a file.
>>
>>  I apologize for the late comments and do not wish to delay plans for
>> this ACDD version. I'll try to make the next group call.
>>
>>  Phil
>>
>>
>>  On Thu, Aug 21, 2014 at 4:16 PM, John Graybeal <jgraybeal at mindspring.com
>> > wrote:
>>
>>> Hi Philip, thanks for your input. Here are my thoughts, looking for
>>> feedback from you and the list.
>>>
>>>   date_product_generated:  Is this attribute intended to hold the
>>> initial create date of the file?
>>>
>>>
>>>  No, it was meant to be when it was distributed (the separation you
>>> wanted). This corresponds originally to the ISO 19115-2 code /gmd:dateType/gmd:CI_DateTypeCode="publication",
>>> which says here
>>> <https://geo-ide.noaa.gov/wiki/index.php?title=ISO_19115_and_19115-2_CodeList_Dictionaries#CI_DateTypeCode> "date
>>> identifies when the resource was issued". To me this means 'visibly
>>> released' to the external users, but some say 'issued' means 'produced' in
>>> the system).
>>>
>>>  As I understand it, the ACDD attribute targets the use case "if I went
>>> to the site on date X, was it there yet?" This being helpful for people or
>>> computers who grab information from a site at every so often, to know what
>>> they don't have to grab.
>>>
>>>  So I agree the word 'generated' is confusing here; I can't find a
>>> discussion where it changed from _issued to _generated, but I think it was
>>> an attempt to avoid the ambiguity of the ISO term 'issued'.
>>>
>>>  Perhaps this is better:
>>>
>>>  *date_product_distributed*: The date on which this individual data
>>> file or other product was distributed (ISO 8601 format). This may be after
>>> the product was created (but not before); therefore the
>>> date_content_modified and date_values_modified should be used to assess the
>>> age of the content.
>>>
>>>  (I wanted to add "If the identical data file or product is distributed
>>> multiple times, this should be the first date of distribution." But it is
>>> pretty wordy already.)
>>>
>>>   date_content_modified, date_values_modified
>>>
>>>    Both definitions mention changes to the "data", which I presume
>>> means changes to variables in the file. Can the definitions and maybe the
>>> attribute names be clarified so that the differences between them are
>>> clear? Suggest using terminology from the netCDF data model
>>> <https://www.unidata.ucar.edu/software/netcdf/docs/html_guide/netcdf_data_set_components.html>
>>> .
>>>
>>>
>>>  Well, that might be more precise, if we can agree. I'm a little
>>> nervous proposing a change, but let's see what people say about just
>>> changing 'data' to 'variables' and 'metadata' to 'attributes':
>>>
>>>  *date_content_modified*:  The date on which any of the provided
>>> content, including variables, attributes, and presented format, was last
>>> created or changed (ISO 8601 format)
>>>
>>> *date_values_modified*: The date on which the provided variables' data
>>> values were last created or changed; excludes attributes and formatting
>>> changes (ISO 8601 format)
>>>
>>>   can you add the original version 1 from 2005 to the wiki
>>>
>>>
>>>  Good suggestion. As discussed on the call, we'll add this.
>>>
>>>  John
>>>
>>>
>>>
>>>  On Aug 21, 2014, at 10:58, Philip Jones - NOAA Affiliate via
>>> Esip-documentation <esip-documentation at lists.esipfed.org> wrote:
>>>
>>>  John, all,
>>>
>>>  I have a few late comments/questions on the date attributes.
>>>
>>>  Attribute:
>>> date_product_generated
>>>     The date on which this data file or product was produced/distributed
>>> (ISO 8601 format). While this date is like a file timestamp, the
>>> date_content_modified and date_values_modified should be used to assess the
>>> age of the contents of the file or product.
>>>
>>> Comment:
>>> The date-time a file was "produced" (generated) is not the same as when
>>> it was "distributed", because not all datasets are distributed in
>>> real-time. Many datasets are produced/generated weeks prior to their
>>> distribution. I recommend separating produced from distributed, which
>>> suggests that date_issued is still relevant. Is this attribute intended to
>>> hold the initial create date of the file?
>>>
>>> Attributes:
>>> date_content_modified
>>>     The date on which any of the provided content, including data,
>>> metadata, and presented format, was last created or changed (ISO 8601
>>> format)
>>> date_values_modified
>>>     The date on which the provided data values were last created or
>>> changed; excludes metadata and formatting changes (ISO 8601 format)
>>>
>>> Comment:
>>> Both definitions mention changes to the "data", which I presume means
>>> changes to variables in the file. Can the definitions and maybe the
>>> attribute names be clarified so that the differences between them are
>>> clear? Suggest using terminology from the netCDF data model
>>> <https://www.unidata.ucar.edu/software/netcdf/docs/html_guide/netcdf_data_set_components.html>.
>>>
>>>
>>> Also, if this group is maitaining a history of all ACDD versions, can
>>> you add the original version 1 from 2005 to the wiki? It is no longer
>>> hosted at Unidata. Archive link from April 2014:
>>> https://web.archive.org/web/20140424133239/http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html
>>>
>>>  Phil
>>>
>>>
>>> On Tue, Aug 19, 2014 at 7:57 PM, John Graybeal via Esip-documentation <
>>> esip-documentation at lists.esipfed.org> wrote:
>>>
>>>> Hi all,
>>>>
>>>>  In case we get time to consider ACDD Thursday, here are the issues
>>>> I've seen on the discussion thread and their current status. The page at
>>>> http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_1-2_Working contains
>>>> the recommended changes.
>>>>
>>>>  If there are no further concerns raised, I'd like to do preliminary
>>>> approval/consent at this call, and schedule the final approval for next
>>>> call. (If there are still concerns, we can discuss them on-list or on the
>>>> call.) The approved document can become Version 2, since several people
>>>> have started calling it that; then everyone is free to work on a
>>>> groups-aware revision, as they see fit.
>>>>
>>>>  A brief reminder: With respect to issues (1) and (2), because ACDD
>>>> attributes are all recommendations -- there are no 'shall' statements in
>>>> the document -- people are still within the specification while not using
>>>> whatever attributes they don't like. So it isn't dysfunctional if there are
>>>> attributes that some choose to omit, or deprecated terms that some choose
>>>> to use.
>>>>
>>>>  === Open Topics ===-
>>>>
>>>>  1) Deprecation of date_* attributes
>>>>
>>>>  This related to the deprecation of
>>>>     date_created, date_issued, data_modified
>>>> attributes, while adding (not 1 for 1)
>>>>     date_content_modified, date_values_modified, date_product_generated.
>>>>
>>>>  This topic was previously summarized in email; review that summary on
>>>> the talk page[1]. If there continue to be concerns, we can vote on the best
>>>> answer..
>>>>
>>>>  2) Adoption of summary metadata for geospatiotemporal ranges (good,
>>>> tolerable, or bad?)
>>>>
>>>> Extensive discussion led to an explicit section addressing key software
>>>> principles[2], and some warning text.  I have not received any critical
>>>> comments since the last round of changes. (I think one critic is satisfied,
>>>> another perhaps just silent. :->)  If concerns remain, we can discuss and
>>>> vote.
>>>>
>>>>  3) Organization of ACDD pages
>>>>
>>>>  There is a bit of confusion still with the current organization. I
>>>> hesitated to go wild with fixes myself, but now that I'm co-chair with
>>>> Anna, I think we can just fix issues as they are identified. If you have an
>>>> issue with the ACDD organization, can you please send it to the list or us,
>>>> as you prefer?  With approval a lot will become more transparent.
>>>>
>>>>  John
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  [1] Summary of date_* attribute concerns:
>>>> http://wiki.esipfed.org/index.php/Talk:Attribute_Convention_for_Data_Discovery_1-2_Working#Attributes_Discussed_and_Resolved
>>>>
>>>>  [2] Spatial and Temporal bounds summary recommendations:
>>>>
>>>> http://wiki.esipfed.org/index.php/Talk:Attribute_Convention_for_Data_Discovery_1-2_Working#Spatial_and_Temporal_Bounds
>>>>
>>>>
>>>>
> --
> *******************************************************
> * Nan Galbraith        Information Systems Specialist *
> * Upper Ocean Processes Group            Mail Stop 29 *
> * Woods Hole Oceanographic Institution                *
> * Woods Hole, MA 02543                 (508) 289-2444 *
> *******************************************************
>
>
>
>


-- 
Philip R. Jones
Team ERT/STG
Government Contractor
National Climatic Data Center, NOAA NESDIS
Veach-Baley Federal Building
151 Patton Ave.
Asheville, NC 28801-5001 USA
Voice: +1 828-271-4472  FAX: +1 828-271-4328
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20140919/9a4d69a9/attachment-0001.html>


More information about the Esip-documentation mailing list