[Esip-documentation] status/open issues for ACDD approval
Nan Galbraith via Esip-documentation
esip-documentation at lists.esipfed.org
Fri Sep 19 10:03:04 EDT 2014
Hi Phil and all -
For files containing observational data, the descriptors 'released',
'published'
and 'issued' are very nearly meaningless terms.
My NetCDF data files each have a single 'version date' that reflects the
last time
the observations or derived data values were changed by editing, or
applying new
calibrations or new algorithms, and a 'file date' that includes the
above plus any
re-write that was due to changes in some format spec (like ACDD) or maybe a
semantic error (such as a missed standard name).
We have no concept of a date on which something was originally available,
either - assorted subsets of our surface mooring data are released in
real time,
other parts take years to be published.
Originally, I tried to use the terms you've listed, which were in
Ethan's first cut
of ACDD, but I had to make up definitions for them - because they really
have
no inherent meaning for my data. I think this is true of most
observational and/or
in situ data.
Do we need 2 sets of 'file date' terms, one for products/models and one for
observational data sets?
Regards-
Nan Galbraith
On 9/18/14 1:56 PM, Philip Jones - NOAA Affiliate via Esip-documentation
wrote:
> John,
>
> Thanks for considering my comments.
>
> I don't see why the descriptors "released", "published' or "issued"
> are a problem for describing the data publication date. Most products
> have an associated version number and by definition this attribute
> would be the date that version was released. (Maybe we need to add an
> attribute for product version number.) I have two concerns with the
> current proposal, date_product_available. 1) Attributes should not
> have a compound purpose, e.g., date the product was originally created
> or made available. These can be two distinct dates. 2) The meaning of
> these attributes must be obvious to users by the attribute name alone,
> and I'm not sure the intended meaning of "available" would be obvious.
> Note that users of these netCDF files will not refer to the ACDD pages
> to look up the attribute definitions. The netCDF creators possibly
> will, but not users.
>
> I very much suggest we keep an attribute that supports the original
> create date. Just about every metadata standard for science data to
> documents, images and video, supports the concept of date created and
> date modified. What do we gain by removing it from ACDD?
>
> I can discuss it more in our meeting.
>
> Phil
>
> On Thu, Aug 28, 2014 at 3:49 PM, John Graybeal
> <jgraybeal at mindspring.com <mailto:jgraybeal at mindspring.com>> wrote:
>
> Philip,
>
> No worries about the late date, if we can make it noticeably
> better I don't think anyone will mind a small delay in finalizing.
> But push to wrap up at this next meeting if we can!
>
> /Regarding date_product_[generated|distributed|released] /: I
> didn't care for 'distributed' because the same product can be
> distributed multiple times; and I didn't care for 'released'
> because that word often has a formal meaning (in opposition to
> unreleased). Anna and I came up with *date_product_available* --
> how does that work for you? The definition, now with further
> clarifications, is
>
>> *date_product_available* : The date on which this individual data
>> file or other product was made available (ISO 8601 format);
>> corresponds to ISO 19115-2 CI:DateTypeCode of "publication". This
>> can be the date the product was originally created in some
>> systems; for others, it may be the date the product was (first)
>> provided to a user. This means the availability date may be after
>> the product was first created; therefore the
>> date_content_modified and date_values_modified should be used to
>> assess the age of the content.
>
> Let's pose the question to the group of whether
> *date_product_generated* adds value, for the purposes you identify
> (provenance tracking and managing additional or replacement
> files). I assume we are trying to assess this from the external
> user's perspective, and allow for file and web service protocols.
> My take: Knowing when the file was created provides no inherent
> advantage to the user receiving that file unless he or she knows
> the mechanism by which the system creates files, and that the
> mechanism won't change. (Obviously the data system that creates
> and publishes the file could tie its provenance records to the
> file creation time, if it keeps data in files internally; but it
> could equally well tie it to the availability time, or a unique
> ID, or the provenance could be much more atomic than a whole data
> file.) I'm not sure which use case you mean by 'managing
> additional or replacement files'; again from the user's
> perspective, I think all the use cases for that are addressed with
> the existing three attributes. Happy to work this through offline
> if that helps.
>
> /Regarding date_[content|values]_modified/ : The terms 'data' and
> 'metadata' are ambiguous in most contexts, including this one; I
> would not like those terms myself. Assuming we are trying to
> satisfy the primary use cases of "when did _anything_ change?" and
> "when did the values change?", maybe we can improve the first by
> replacing 'content' with 'product': *date_product_modified*.
>
> I can't think of a better term than 'values'; 'variables' to many
> includes the variable attributes, which we are explicitly trying
> to exclude. Since the definition is the important thing, maybe we
> can choose from a list of possible name pairs at the next meeting?
> What choices would you add to the following?
> 1) date_content_modified, date_values_modified
> 2) date_product_modified, date_values_modified
> 3) date_data_modified, date_metadata_modified
>
>
> John
>
>
>
>
> On Aug 22, 2014, at 06:35, Philip Jones - NOAA Affiliate
> <philip.jones at noaa.gov <mailto:philip.jones at noaa.gov>> wrote:
>
>> John, thanks for your responses.
>>
>> If that is the intended meaning of date_product_generated, then I
>> agree the attribute name should better reflect that meaning. If
>> you want to avoid using the word "issued", then maybe use
>> "released" or "published" in the name. For example, date_released
>> or date_published. Because "distributed" can be understood as an
>> ongoing activity, whereas "released" or "published" imply the
>> initial distribution of a particular version.
>>
>> I'm still not sure the definitions of date_content_modified and
>> date_values_modified would be apparent to users of a data file.
>> What about simply using date_data_modified and
>> date_metadata_modified?
>>
>> The rationale for deprecating the date_created attribute on the
>> ACDD page says:
>> "date_created:deleted in favor of date_product_generated (which
>> used to be date_issued); we did not have a use case for knowing
>> the date a stream or product was _first_ generated, once it has
>> been updated"
>> Having the producer's initial create date of a file is important
>> for provenance tracking and for managing additional or
>> replacement files that may be created. Only using the *_modified
>> date attributes creates a dependency on using the history
>> attribute correctly with change details in order to determine the
>> original create date of a file.
>>
>> I apologize for the late comments and do not wish to delay plans
>> for this ACDD version. I'll try to make the next group call.
>>
>> Phil
>>
>>
>> On Thu, Aug 21, 2014 at 4:16 PM, John Graybeal
>> <jgraybeal at mindspring.com <mailto:jgraybeal at mindspring.com>> wrote:
>>
>> Hi Philip, thanks for your input. Here are my thoughts,
>> looking for feedback from you and the list.
>>
>>> date_product_generated: Is this attribute intended to hold
>>> the initial create date of the file?
>>
>> No, it was meant to be when it was distributed (the
>> separation you wanted). This corresponds originally to the
>> ISO 19115-2 code
>> /gmd:dateType/gmd:CI_DateTypeCode="publication", which says
>> here
>> <https://geo-ide.noaa.gov/wiki/index.php?title=ISO_19115_and_19115-2_CodeList_Dictionaries#CI_DateTypeCode> "date
>> identifies when the resource was issued". To me this means
>> 'visibly released' to the external users, but some say
>> 'issued' means 'produced' in the system).
>>
>> As I understand it, the ACDD attribute targets the use case
>> "if I went to the site on date X, was it there yet?" This
>> being helpful for people or computers who grab information
>> from a site at every so often, to know what they don't have
>> to grab.
>>
>> So I agree the word 'generated' is confusing here; I can't
>> find a discussion where it changed from _issued to
>> _generated, but I think it was an attempt to avoid the
>> ambiguity of the ISO term 'issued'.
>>
>> Perhaps this is better:
>>
>> *date_product_distributed*: The date on which this individual
>> data file or other product was distributed (ISO 8601 format).
>> This may be after the product was created (but not before);
>> therefore the date_content_modified and date_values_modified
>> should be used to assess the age of the content.
>>
>> (I wanted to add "If the identical data file or product is
>> distributed multiple times, this should be the first date of
>> distribution." But it is pretty wordy already.)
>>
>>> date_content_modified, date_values_modified
>>> Both definitions mention changes to the "data", which I
>>> presume means changes to variables in the file. Can the
>>> definitions and maybe the attribute names be clarified so
>>> that the differences between them are clear? Suggest using
>>> terminology from the netCDF data model
>>> <https://www.unidata.ucar.edu/software/netcdf/docs/html_guide/netcdf_data_set_components.html>.
>>>
>>
>> Well, that might be more precise, if we can agree. I'm a
>> little nervous proposing a change, but let's see what people
>> say about just changing 'data' to 'variables' and 'metadata'
>> to 'attributes':
>>
>> *date_content_modified*: The date on which any of the
>> provided content, including variables, attributes, and
>> presented format, was last created or changed (ISO 8601 format)
>>
>> *date_values_modified*: The date on which the provided
>> variables' data values were last created or changed; excludes
>> attributes and formatting changes (ISO 8601 format)
>>
>>> can you add the original version 1 from 2005 to the wiki
>>
>> Good suggestion. As discussed on the call, we'll add this.
>>
>> John
>>
>>
>>
>> On Aug 21, 2014, at 10:58, Philip Jones - NOAA Affiliate via
>> Esip-documentation <esip-documentation at lists.esipfed.org
>> <mailto:esip-documentation at lists.esipfed.org>> wrote:
>>
>>> John, all,
>>>
>>> I have a few late comments/questions on the date attributes.
>>>
>>> Attribute:
>>> date_product_generated
>>> The date on which this data file or product was
>>> produced/distributed (ISO 8601 format). While this date is
>>> like a file timestamp, the date_content_modified and
>>> date_values_modified should be used to assess the age of the
>>> contents of the file or product.
>>>
>>> Comment:
>>> The date-time a file was "produced" (generated) is not the
>>> same as when it was "distributed", because not all datasets
>>> are distributed in real-time. Many datasets are
>>> produced/generated weeks prior to their distribution. I
>>> recommend separating produced from distributed, which
>>> suggests that date_issued is still relevant. Is this
>>> attribute intended to hold the initial create date of the file?
>>>
>>> Attributes:
>>> date_content_modified
>>> The date on which any of the provided content, including
>>> data, metadata, and presented format, was last created or
>>> changed (ISO 8601 format)
>>> date_values_modified
>>> The date on which the provided data values were last
>>> created or changed; excludes metadata and formatting changes
>>> (ISO 8601 format)
>>>
>>> Comment:
>>> Both definitions mention changes to the "data", which I
>>> presume means changes to variables in the file. Can the
>>> definitions and maybe the attribute names be clarified so
>>> that the differences between them are clear? Suggest using
>>> terminology from the netCDF data model
>>> <https://www.unidata.ucar.edu/software/netcdf/docs/html_guide/netcdf_data_set_components.html>.
>>>
>>>
>>> Also, if this group is maitaining a history of all ACDD
>>> versions, can you add the original version 1 from 2005 to
>>> the wiki? It is no longer hosted at Unidata. Archive link
>>> from April 2014:
>>> https://web.archive.org/web/20140424133239/http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html
>>>
>>> Phil
>>>
>>>
>>> On Tue, Aug 19, 2014 at 7:57 PM, John Graybeal via
>>> Esip-documentation <esip-documentation at lists.esipfed.org
>>> <mailto:esip-documentation at lists.esipfed.org>> wrote:
>>>
>>> Hi all,
>>>
>>> In case we get time to consider ACDD Thursday, here are
>>> the issues I've seen on the discussion thread and their
>>> current status. The page at
>>> http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_1-2_Working contains
>>> the recommended changes.
>>>
>>> If there are no further concerns raised, I'd like to do
>>> preliminary approval/consent at this call, and schedule
>>> the final approval for next call. (If there are still
>>> concerns, we can discuss them on-list or on the call.)
>>> The approved document can become Version 2, since
>>> several people have started calling it that; then
>>> everyone is free to work on a groups-aware revision, as
>>> they see fit.
>>>
>>> A brief reminder: With respect to issues (1) and (2),
>>> because ACDD attributes are all recommendations -- there
>>> are no 'shall' statements in the document -- people are
>>> still within the specification while not using whatever
>>> attributes they don't like. So it isn't dysfunctional if
>>> there are attributes that some choose to omit, or
>>> deprecated terms that some choose to use.
>>>
>>> === Open Topics ===-
>>>
>>> 1) Deprecation of date_* attributes
>>>
>>> This related to the deprecation of
>>> date_created, date_issued, data_modified
>>> attributes, while adding (not 1 for 1)
>>> date_content_modified, date_values_modified,
>>> date_product_generated.
>>>
>>> This topic was previously summarized in email; review
>>> that summary on the talk page[1]. If there continue to
>>> be concerns, we can vote on the best answer..
>>>
>>> 2) Adoption of summary metadata for geospatiotemporal
>>> ranges (good, tolerable, or bad?)
>>> Extensive discussion led to an explicit section
>>> addressing key software principles[2], and some warning
>>> text. I have not received any critical comments since
>>> the last round of changes. (I think one critic is
>>> satisfied, another perhaps just silent. :->) If
>>> concerns remain, we can discuss and vote.
>>>
>>> 3) Organization of ACDD pages
>>>
>>> There is a bit of confusion still with the current
>>> organization. I hesitated to go wild with fixes myself,
>>> but now that I'm co-chair with Anna, I think we can just
>>> fix issues as they are identified. If you have an issue
>>> with the ACDD organization, can you please send it to
>>> the list or us, as you prefer? With approval a lot will
>>> become more transparent.
>>>
>>> John
>>>
>>>
>>>
>>>
>>>
>>>
>>> [1] Summary of date_* attribute concerns:
>>> http://wiki.esipfed.org/index.php/Talk:Attribute_Convention_for_Data_Discovery_1-2_Working#Attributes_Discussed_and_Resolved
>>>
>>> [2] Spatial and Temporal bounds summary recommendations:
>>> http://wiki.esipfed.org/index.php/Talk:Attribute_Convention_for_Data_Discovery_1-2_Working#Spatial_and_Temporal_Bounds
>>>
>>>
--
*******************************************************
* Nan Galbraith Information Systems Specialist *
* Upper Ocean Processes Group Mail Stop 29 *
* Woods Hole Oceanographic Institution *
* Woods Hole, MA 02543 (508) 289-2444 *
*******************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20140919/d3eddb78/attachment-0001.html>
More information about the Esip-documentation
mailing list