[Esip-documentation] ACDD 1.3 issue: date & time stamps

Nan Galbraith via Esip-documentation esip-documentation at lists.esipfed.org
Thu Oct 9 11:19:35 EDT 2014


Hi all -

I'm not sure if I'll be able to make the meeting this afternoon, sorry
about that, but we're prepping for a deployment (my day job).

On 10/7/14 5:41 PM, Bob Simons - NOAA Federal via Esip-documentation wrote:
> While there are lots of types of artifacts that could be 
> date/timestamped and thus lots of terms for them, isn't that irrelevant?
> Doesn't a given set of ACDD metadata apply to the data it is attached 
> to (e.g., a file, a file with a granule, a file with a collection, a 
> virtual aggregated dataset in THREDDS, near real time, delayed)?
I agree with this, with the caveat that within that entity (e.g., a 
file, a file with a granule
etc) there are different components that might need date tags: the 
"data", the "metadata"
and the entity itself (whose date might reflect aggregation, formatting 
changes, or just
being accessed from a server somewhere).

> And if we use an attribute name with an artifact name, e.g., 
> date_file_modified, doesn't that become an anachronism when the 
> dataset is served in THREDDS? And doesn't an attribute name that 
> doesn't specify the artifact (e.g., date_modified) remain relevant and 
> appropriate?
I don't think so - in fact I really disagree with this. If you're 
aggregating
data from TDS, you probably really DO want to retain the fact that this
data came from a file with a modification date of X. Otherwise, all hope is
lost of understanding the provenance of your outputs.

Maybe, in an entity other than the 'original' file, this info doesn't 
need to
be stored as a free-standing global date attribute, but should become part
of the history attribute. Would we be re-defining the history attribute 
if we
suggested that version dates of inputs be included there (I think 'history'
is not ours to muck around with, but...)

Does ACDD need to specify where this information should be stored? We
seem to be having enough trouble defining terms for data originators ...
For people who are aggregating and changing data, it's their responsibility
to keep the metadata attached to the appropriate artifact or granule - but
this is a discovery convention and might not need to get into too much
detail on re-use metadata.

Does this problem go away if we define 'date_modified' as V1.0 did, 'date
data modified'? If so, I'm on board with that.

Cheers - Nan

> On 2014-10-07 1:59 PM, John Graybeal via Esip-documentation wrote:
>> For the issue of date and time stamps, I've tried to distill the key 
>> approaches and issues from emails into the Active Issues 
>> <https://docs.google.com/spreadsheets/d/19fl5AgGkckG03yTchUjYUp4YnR09Fn1Nqps2KHenkC4/edit#gid=0> 
>> table (item 9, starts row 71), mostly focusing on concrete proposals. 
>> These are my conclusions so far.
>>
>> 1) Beyond the key 3-4 terms, the need for any particular term is 
>> small; but many need or want *some* other term(s).
>> 2) Thus, considering the 'useful' use cases will create a big set of 
>> functions or 'date types', somewhere between Jim's initial list and 
>> the ISO list.
>> 3) The number of artifacts that we are considering timestamping 
>> (file, data, etc.) is shorter but more than 2; reference list below. 
>> This is a multiplier against the list of useful functions.
>> 4) "I'd like to keep things lean and flexible, and not build out a 
>> complex taxonomy unless we really need one." (Jim B,)
>>
>> This sends me back to the original 3 terms date_* (created, modified, 
>> issued) as a necessary starting point. My strong suspicion is that 
>> casual users assume date_* terms (with no artifact) refer to the 
>> whole file/product, not just the data. So I suggest the default 
>> definitions go that direction, rather than Bob's proposal. But I'll 
>> settle for anything explicit.
>>
>> I note as a matter of process that if we do not achieve consensus on 
>> a useful collection of date/time stamps, we will be stuck with the 
>> original definitions + whatever modifications can get 70% vote. This 
>> adds to my initial focus on whether we can improve the first 3 
>> definitions, as that will affect how we consider other needed functions.
>>
>> John
>>
>>
>>
>> Terms mentioned for the artifact to be date/timestamped (with nominal 
>> synonyms by me; they aren't perfect but represent large conceptual 
>> overlaps):
>> - file = product ~ granule instance (Jim, I think your list of 
>> artifacts should include one corresponding directly to file)
>> - data = values;
>> - metadata = attributes;
>> - static dataset;
>> - near real-time dataset;
>> - collection;
>> - granule;
>> - resource ("The term resource is used in 19115-1 as a replacement 
>> for dataset in order to emphasize that metadata can be used at many 
>> levels and for many kinds of things.")
>>
>>
>> On Oct 6, 2014, at 05:31, Ge Peng - NOAA Affiliate via 
>> Esip-documentation <esip-documentation at lists.esipfed.org 
>> <mailto:esip-documentation at lists.esipfed.org>> wrote:
>>
>>> Trying to look at the date issue from a different perspective:
>>>
>>>
>>> Static historical datasets and near real-time datasets have 
>>> different requirements for creation date. So do that for 
>>> collection-level metadata records and file-level metadata 
>>> attributes. Search and discovery will touch on both collection- and 
>>> file-level metadata in some way eventually.
>>>
>>>
>>> Information about the “original” creation date and modification date 
>>> can be useful, especially for data provenance. However, both 
>>> “original” creation date and “modification” date imply a need for 
>>> keeping track of dates.
>>>
>>>
>>> For the static datasets and collection-level metadata records, 
>>> decisions to create and modify them tend to happen less frequently 
>>> with weak or no latency requirement and are often occurred 
>>> subjectively and documented so they may be more feasible to be 
>>> implemented. Original creation date may be relevant and good to have 
>>> in this case.
>>>
>>>
>>> On the other hand, for the near real-time datasets that usually have 
>>> strong data-latency requirement and file-level metadata such as 
>>> those global attributes of a NetCDF file in a near real-time 
>>> dataset, it may not be feasible to keep track of original creation 
>>> date, although a version number/date could be utilized. The file 
>>> creation date/time stamp may be more appropriate in this case.
>>>
>>>
>>> Following the discussion thread on the date and time stamps and 
>>> based on my experience working with both static and near real-time 
>>> datasets, it has become apparent to me that having a date type 
>>> element may offer an option to allow the flexibility of implementing 
>>> different types of dates for different types of data as it may be 
>>> nearly impossible for us to find an one-size-fits-all solution.
>>>
>>>
>>> Coming late to the discussions and not wanting to stir up any 
>>> additional discussion, I have withheld my comment until our last 
>>> meeting when the type element for creator was suggested. Now with 
>>> Jim’s suggestion to approach this issue in a more systematic way, I 
>>> am putting my suggestion in – hopefully it will help us to reach a 
>>> more consistent decision as it will have long-lasting implication 
>>> for everyone – providers, stewards, developers, and users.
>>>
>>>
>>> Best regards,
>>>
>>>
>>> --- Peng
>>>
>>>
>>> P.S: Being thinking about it over the weekend but decided to post my 
>>> comment on this morning – maybe overlapping with Ted and John’s 
>>> comments.
>>>
>>>
>>> On Fri, Oct 3, 2014 at 3:54 PM, John Graybeal via Esip-documentation 
>>> <esip-documentation at lists.esipfed.org 
>>> <mailto:esip-documentation at lists.esipfed.org>> wrote:
>>>
>>>     The issue of date/timestamps is by far the most challenging ACDD
>>>     issue.
>>>
>>>     A very short version of the very long 1.3 history:
>>>     A) A year-plus ago, several members identified ambiguities in
>>>     definition, understanding, and use of the originals
>>>     (*date_created*, *date_modified*, *date_issued*)
>>>     B) An extensive analysis/discussion took place starting over a
>>>     year's time; the opinion of that group was that new terms should
>>>     be created in place of the old. These terms were
>>>     *date_content_modified*, *date_values_modified*,
>>>     *date_product_generated*. A fourth was proposed but not settled
>>>     on, a la *date_product_originally_created*. Another request
>>>     relative to this group was for a publication date (which could
>>>     be *date_issued* or something else, depending on definitions.)
>>>     C) The broader group's recent (general) decision was that
>>>     existing terms should be kept, but redefined if necessary.
>>>     D Our 10-minute attempt yesterday helped bring out some of the
>>>     original and ongoing issues.
>>>     E) Jim Biard's email this morning ("ACDD date attributes
>>>     question) takes a back-to-fundamentals approach, laying out some
>>>     suggested use cases and concepts.
>>>
>>>     For reference (only!), I've provided below definitions of these
>>>     terms more or less as they originated or were improved.
>>>
>>>     From a process perspective I think we might want to have a
>>>     separate sub-group hash out a recommendation, but I think Jim's
>>>     idea of gather requirements is a necessary first step anyway. So
>>>     I propose
>>>     (a) you *respond to Jim's email* *with your inputs to his
>>>     summary*, and
>>>     (b) *reply to this email with any other comments or analysis*,
>>>     especially about process.
>>>
>>>     I do not have any clever ideas to make this topic more easily
>>>     resolvable, given the constraints in (A) and (C) above. Let's
>>>     start the discussion and see how it goes.
>>>
>>>     John
>>>
>>>     ====================
>>>
>>>     /date_created/
>>>     The date on which the data was created.
>>>     /date_modified/
>>>     The date on which this data was last modified.
>>>     /date_issued/
>>>     The date on which this data was formally issued.
>>>     /date_content_modified/
>>>     The date on which any of the provided content, including data,
>>>     metadata, and presented format, was last changed (including
>>>     creation)
>>>     /date_values_modified/
>>>     The date on which the provided data values were last changed
>>>     (including creation); excludes metadata and formatting changes
>>>     /date_product_generated/
>>>     The date on which this data file or product was
>>>     produced/distributed. While this date is like a file timestamp,
>>>     the date_content_modified and date_values_modified should be
>>>     used to assess the age of the contents of the file or product.
>>>     /date_product_originally_created/
>>>     The date on which this data file or product first came into
>>>     existence.
>>>
>>>     <END>
>>>
>>>


-- 
*******************************************************
* Nan Galbraith        Information Systems Specialist *
* Upper Ocean Processes Group            Mail Stop 29 *
* Woods Hole Oceanographic Institution                *
* Woods Hole, MA 02543                 (508) 289-2444 *
*******************************************************





More information about the Esip-documentation mailing list