[Esip-documentation] Let's get rid of spatial and temporal bounds in ACDD

Signell, Richard rsignell at usgs.gov
Fri Mar 7 14:41:19 EST 2014


John,

I hear you, but I could argue that the software is broken because *we*
broke it when we introduced ACDD about 20 years after NetCDF was
invented.  ;-)

I'm only arguing for removal of the geospatial and temporal bound
metadata, because subsetting and aggregation is so common in these
data (heck, subsetting is one of the main reasons NetCDF and OPeNDAP
developed).  And because these attributes duplicate information that
is already contained in the dataset.

It's great that PO.DAAC does the right thing, but who is going to
battle with all the developers NetCDF/OPeNDAP clients (pretty long
list at http://www.unidata.ucar.edu/software/netcdf/software.html) and
all the R, Python and Matlab users who subset data?

I can see that the bound attributes are useful in coupled frameworks
like PO.DAAC, but in general I they are causing more harm than good.

I like to say incorrect metadata is worse than no metadata, but in
this case we have incorrect metadata that is redundant with the
information in the file already.

Keep the comments coming.  This is way more fun than what I was doing...

-Rich

On Fri, Mar 7, 2014 at 1:50 PM, John Graybeal <jbgraybeal at mindspring.com> wrote:
>
> On Mar 7, 2014, at 10:09, "Signell, Richard" <rsignell at usgs.gov> wrote:
>
>> The problem is that as soon as someone subsets or aggregates data from netcdf or opendap datasets that use ACDD, the ACDD attributes in the resulting dataset are wrong.
>
> *If the client is broken*, yes, the metadata will be wrong. Any client that passes through metadata that it has effectively changed is BROKEN. Why are you accepting obvious bugs from the broken clients?
>
> I expand your rant  because the list of metadata which that client is breaking won't be limited to the geospatial category.  It's a safe bet that any of the following metadata fields are wrong too, if they exist:
> - title
> - summary
> - history
> - source (if the data was transformed in any way other than subsetting)
> - processing_level (if transformations are involved)
> - creator (and related)
> - publisher (and related)
> - license
> - metadata_link
> - time_coverage_resolution (this is a marker of intent, not necessarily derivable by examination of the file)
> - uuid
> - id
> - naming_authority
> While you're manually working with the selection of files that's maybe not a big deal, but at scale your collection of data will be a mess.
>
> In the end, I don't care which category the geospatial parameters end up in -- they may be computationally expensive to recalculate, but that's affordable at least.
>
> But I care a lot that we don't push back more on the real problem -- bad metadata that's created by broken software -- because it makes data reuse impossible to scale.
>
> John



-- 
Dr. Richard P. Signell   (508) 457-2229
USGS, 384 Woods Hole Rd.
Woods Hole, MA 02543-1598


More information about the Esip-documentation mailing list