[Esip-documentation] ACDD date (version and other dates)

Jim Biard via Esip-documentation esip-documentation at lists.esipfed.org
Fri Oct 3 17:31:32 EDT 2014


Nan,

I'm not suggesting we try to describe everything in ACDD. I'm suggesting 
that we try to think about the problem in more generic terms, then 
narrow down to what belongs in ACDD. I find that when people rush into 
implementation details too quickly (and this is something that 
scientists are prone to do), things often get confused and tangled. If 
we take a moment to think about what the things are that we are trying 
to date stamp and how they relate to each other, then it may help us 
converge on the best solution.

I'm about to head home, so I'll get back to you on Monday with further 
thoughts on your email.

Grace and peace,

Jim

On 10/3/14, 4:00 PM, Nan Galbraith via Esip-documentation wrote:
> Hi Jim, and all -
>
> In the spirit of a comment from yesterday's meeting, people prefer short,
> simple specifications - let's not try to describe everything about 
> versions of
> a data 'instance' in ACDD.  Since this is a discovery specification,  
> we might narrow
> our discussion of version dates to what is needed for a user to find 
> out whether
> a NetCDF file (instance) he encounters is something he needs to get 
> his hands
> on.  At least, we may want to keep that in mind when we look at use 
> cases for
> the various dates we're considering.
>
> Since I work with NetCDF files, I'm going to skip over the 
> granule/collection
> part of Jim's email and get right to what we've been calling 'file times'.
>
> I have 2 useful time stamps, with use cases that are very common for
> in situ data - I know I've been harping on this for a long time, but I'll
> outline it again, please bear with me:
>
> - the time the last edit or processing of observed (e.g. temperature) 
> or calculated
> (e.g. salinity) values occurred. This is the '*data version*'. 
> Discovery use case: a
> colleague can't reproduce our bulk flux outputs, he needs to determine 
> if his
> input data is the same 'data version' as what we used (otherwise,  his 
> algorithm
> may be different (therefore, wrong)).
>
> - the time the file was written, which could simply reflect formatting 
> or metadata
> changes. Use case: data user has many questions about  e.g. sensor 
> heights, which
> may have been added to the data set after he accessed it. Having this 
> time stamp
> allows him to see if his metadata is out of date; it also allows me to 
> check if a remote
> server has the most up to date metadata and format.
>
> With regards to 'original' time - which you call 'data was first 
> produced/acquired',
> I have to get into the weeds to explain why this date should not be 
> 'recommended',
> but might be in a category of 'suggested, if needed'.
>
> We put our real time data on the web, starting the moment the 
> transmitters are
> turned on. There might be 1 record in each file at that time, and it's 
> probably
> junk, since the instruments are in a parking lot - I may not even know 
> this time,
> if the transmitters are turned on over the weekend. When we recover 
> instruments,
> we discard the real time data and publish the 'first cut' of the 
> internally recorded
> data, which is later overwritten by an edited version, re-processed 
> with post-cals.
> What is the use case for providing the 'first produced' date for this 
> kind of data?
>
> This was my earlier proposal; I'd be glad to change 'file date' to 
> 'instance date' or
> something similar. I still like the idea of leaving it up to the user 
> to decide what
> level of change precipitates a new version date.
>> Maybe we should use version_date for substantive changes, and
>> file_date for the actual time stamp of the file; it would then be up
>> to the provider to decide what constitutes a new version of a file;
>> slight formatting changes, additional non-critical metadata would
>> not, but new algorithms or added data might.
>
> Cheers -
> Nan
>
> On 10/3/14 2:06 PM, Jim Biard via Esip-documentation wrote:
>> Hi.
>>
>> I was wondering if it would be useful to back the whole date 
>> attribute question up a bit.
>>
>> Without using any existing or proposed attribute name, can each 
>> stakeholder describe what kinds of date stamps they need and want?
>>
>> When describing these date stamps, I see three different entities 
>> (sort of) that they might relate to - and there are probably more. 
>> The ones that I see are:
>>
>>   * Granule - An atom of data that is bounded in space and/or time.
>>     One granule can include multiple variables, and has variable- and
>>     granule-level metadata. A granule *is not* a netCDF file. It is
>>     data and metadata floating free in "the cloud".
>>   * Collection - A group of granules that are treated as a consistent
>>     whole. A collection may be static, or it may grow over time. As
>>     with a granule, a collection is a conceptual object in "the cloud".
>>   * Granule Instance - A granule expressed as one or more netCDF files.
>>
>>
>> Using these terms, here are date stamps that I find useful/needed. 
>> Most all of these should have accompanying annotations in history 
>> metadata.
>>
>>   * Date a granule's data was first produced/acquired. This can get
>>     tricky for a granule consisting of a long time series.
>>   * Date a granule's metadata was first associated with the data.
>>   * Date a granule's data was last modified.
>>   * Date a granule's metadata was last modified.
>>   * Date a granule instance was created.
>>   * Date a granule instance was last modified.
>>   * Date a collection was established. (I say it this way on account
>>     of growing collections.) I guess this amounts to a
>>     version/edition time stamp.
>>
>>
>> There are other entities and date stamps that I have left out because 
>> I didn't see them as being relevant to a particular granule instance 
>> in a netCDF file.
>>
>> Do these make sense? Are there others that you can think of?
>>
>> Grace and peace,
>>
>> Jim
>>
>> CICS-NC <http://www.cicsnc.org/> Visit us on
>> Facebook <http://www.facebook.com/cicsnc> 	*Jim Biard*
>> *Research Scholar*
>> Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
>> North Carolina State University <http://ncsu.edu/>
>> NOAA's National Climatic Data Center <http://ncdc.noaa.gov/>
>> 151 Patton Ave, Asheville, NC 28801
>> e: jbiard at cicsnc.org
>> o: +1 828 271 4900
>>
>>
>>
>>
>>
>>
>> This body part will be downloaded on demand.
>
>
> -- 
> *******************************************************
> * Nan Galbraith        Information Systems Specialist *
> * Upper Ocean Processes Group            Mail Stop 29 *
> * Woods Hole Oceanographic Institution                *
> * Woods Hole, MA 02543                 (508) 289-2444 *
> *******************************************************
>
>
>
>
> _______________________________________________
> Esip-documentation mailing list
> Esip-documentation at lists.esipfed.org
> http://www.lists.esipfed.org/mailman/listinfo/esip-documentation

-- 
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> 	*Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA's National Climatic Data Center <http://ncdc.noaa.gov/>
151 Patton Ave, Asheville, NC 28801
e: jbiard at cicsnc.org
o: +1 828 271 4900




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20141003/50f9f3bb/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cbehbcaf.png
Type: image/png
Size: 11847 bytes
Desc: not available
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20141003/50f9f3bb/attachment-0001.png>


More information about the Esip-documentation mailing list