[Esip-documentation] ACDD date (version and other dates)
Nan Galbraith via Esip-documentation
esip-documentation at lists.esipfed.org
Fri Oct 3 16:00:59 EDT 2014
Hi Jim, and all -
In the spirit of a comment from yesterday's meeting, people prefer short,
simple specifications - let's not try to describe everything about
versions of
a data 'instance' in ACDD. Since this is a discovery specification, we
might narrow
our discussion of version dates to what is needed for a user to find out
whether
a NetCDF file (instance) he encounters is something he needs to get his
hands
on. At least, we may want to keep that in mind when we look at use
cases for
the various dates we're considering.
Since I work with NetCDF files, I'm going to skip over the
granule/collection
part of Jim's email and get right to what we've been calling 'file times'.
I have 2 useful time stamps, with use cases that are very common for
in situ data - I know I've been harping on this for a long time, but I'll
outline it again, please bear with me:
- the time the last edit or processing of observed (e.g. temperature) or
calculated
(e.g. salinity) values occurred. This is the '*data version*'. Discovery
use case: a
colleague can't reproduce our bulk flux outputs, he needs to determine
if his
input data is the same 'data version' as what we used (otherwise, his
algorithm
may be different (therefore, wrong)).
- the time the file was written, which could simply reflect formatting
or metadata
changes. Use case: data user has many questions about e.g. sensor
heights, which
may have been added to the data set after he accessed it. Having this
time stamp
allows him to see if his metadata is out of date; it also allows me to
check if a remote
server has the most up to date metadata and format.
With regards to 'original' time - which you call 'data was first
produced/acquired',
I have to get into the weeds to explain why this date should not be
'recommended',
but might be in a category of 'suggested, if needed'.
We put our real time data on the web, starting the moment the
transmitters are
turned on. There might be 1 record in each file at that time, and it's
probably
junk, since the instruments are in a parking lot - I may not even know
this time,
if the transmitters are turned on over the weekend. When we recover
instruments,
we discard the real time data and publish the 'first cut' of the
internally recorded
data, which is later overwritten by an edited version, re-processed with
post-cals.
What is the use case for providing the 'first produced' date for this
kind of data?
This was my earlier proposal; I'd be glad to change 'file date' to
'instance date' or
something similar. I still like the idea of leaving it up to the user to
decide what
level of change precipitates a new version date.
> Maybe we should use version_date for substantive changes, and
> file_date for the actual time stamp of the file; it would then be up
> to the provider to decide what constitutes a new version of a file;
> slight formatting changes, additional non-critical metadata would
> not, but new algorithms or added data might.
Cheers -
Nan
On 10/3/14 2:06 PM, Jim Biard via Esip-documentation wrote:
> Hi.
>
> I was wondering if it would be useful to back the whole date attribute
> question up a bit.
>
> Without using any existing or proposed attribute name, can each
> stakeholder describe what kinds of date stamps they need and want?
>
> When describing these date stamps, I see three different entities
> (sort of) that they might relate to - and there are probably more. The
> ones that I see are:
>
> * Granule - An atom of data that is bounded in space and/or time.
> One granule can include multiple variables, and has variable- and
> granule-level metadata. A granule *is not* a netCDF file. It is
> data and metadata floating free in "the cloud".
> * Collection - A group of granules that are treated as a consistent
> whole. A collection may be static, or it may grow over time. As
> with a granule, a collection is a conceptual object in "the cloud".
> * Granule Instance - A granule expressed as one or more netCDF files.
>
>
> Using these terms, here are date stamps that I find useful/needed.
> Most all of these should have accompanying annotations in history
> metadata.
>
> * Date a granule's data was first produced/acquired. This can get
> tricky for a granule consisting of a long time series.
> * Date a granule's metadata was first associated with the data.
> * Date a granule's data was last modified.
> * Date a granule's metadata was last modified.
> * Date a granule instance was created.
> * Date a granule instance was last modified.
> * Date a collection was established. (I say it this way on account
> of growing collections.) I guess this amounts to a version/edition
> time stamp.
>
>
> There are other entities and date stamps that I have left out because
> I didn't see them as being relevant to a particular granule instance
> in a netCDF file.
>
> Do these make sense? Are there others that you can think of?
>
> Grace and peace,
>
> Jim
>
> CICS-NC <http://www.cicsnc.org/> Visit us on
> Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
> *Research Scholar*
> Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
> North Carolina State University <http://ncsu.edu/>
> NOAA's National Climatic Data Center <http://ncdc.noaa.gov/>
> 151 Patton Ave, Asheville, NC 28801
> e: jbiard at cicsnc.org
> o: +1 828 271 4900
>
>
>
>
>
>
> This body part will be downloaded on demand.
--
*******************************************************
* Nan Galbraith Information Systems Specialist *
* Upper Ocean Processes Group Mail Stop 29 *
* Woods Hole Oceanographic Institution *
* Woods Hole, MA 02543 (508) 289-2444 *
*******************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20141003/0e8ff1c3/attachment.html>
More information about the Esip-documentation
mailing list