[Esip-documentation] Let's get rid of spatial and temporal bounds in ACDD
John Graybeal
jbgraybeal at mindspring.com
Fri Mar 7 15:29:31 EST 2014
On reflection, I do fundamentally disagree with your original proposal -- I thought you just wanted to not make these attributes recommended, but you want them specifically not recommended. This would take away a useful tool for many applications and systems, when as you note you already have the capability to detect problem files and compensate for them.
Point taken about the fact that standards come after the software is written, but it doesn't change my complaint about the software being broken.
Keeping in mind that copying metadata is in some sense a conscious decision, whether by developer or individual subsetter... The decision that it's OK to pass on any attribute that was in the original file, without qualifying that attribute in any way to indicate that it may not apply to the newly _altered_ file, is not thought through. The software has no way to tell what those attributes are describing (modification dates? versions? creators of the resulting files? file size? could be anything). To put them into the new file blindly is, well, broken.
You/we, the users of the resulting files, will do battle with the providers of those files. They will do battle with the developers of the utilities. (Who will in turn do battle with their funders...) But just maybe, identifying and calling out the bad practices -- while still appreciating the labors of the developers, who I'm sure had to work hard to solve many other issues -- will begin to get developers to be thinking about what those little fields, known and unknown, are there for.
I think the question of whether they are doing more harm than good has to be aggregated across all users and uses, including the many users who benefit from the economies of scale in even a few coupled frameworks. And today PO.DAAC may be one of only a few coupled frameworks -- by next year or the year after I bet we have dozens of coupled frameworks, all doing the right thing and interchanging these files with each other. The problematic files will quickly become the exception rather than the rule.
John
On Mar 7, 2014, at 11:41, "Signell, Richard" <rsignell at usgs.gov> wrote:
> John,
>
> I hear you, but I could argue that the software is broken because *we*
> broke it when we introduced ACDD about 20 years after NetCDF was
> invented. ;-)
>
> I'm only arguing for removal of the geospatial and temporal bound
> metadata, because subsetting and aggregation is so common in these
> data (heck, subsetting is one of the main reasons NetCDF and OPeNDAP
> developed). And because these attributes duplicate information that
> is already contained in the dataset.
>
> It's great that PO.DAAC does the right thing, but who is going to
> battle with all the developers NetCDF/OPeNDAP clients (pretty long
> list at http://www.unidata.ucar.edu/software/netcdf/software.html) and
> all the R, Python and Matlab users who subset data?
>
> I can see that the bound attributes are useful in coupled frameworks
> like PO.DAAC, but in general I they are causing more harm than good.
>
> I like to say incorrect metadata is worse than no metadata, but in
> this case we have incorrect metadata that is redundant with the
> information in the file already.
>
> Keep the comments coming. This is way more fun than what I was doing...
>
> -Rich
>
> On Fri, Mar 7, 2014 at 1:50 PM, John Graybeal <jbgraybeal at mindspring.com> wrote:
>>
>> On Mar 7, 2014, at 10:09, "Signell, Richard" <rsignell at usgs.gov> wrote:
>>
>>> The problem is that as soon as someone subsets or aggregates data from netcdf or opendap datasets that use ACDD, the ACDD attributes in the resulting dataset are wrong.
>>
>> *If the client is broken*, yes, the metadata will be wrong. Any client that passes through metadata that it has effectively changed is BROKEN. Why are you accepting obvious bugs from the broken clients?
>>
>> I expand your rant because the list of metadata which that client is breaking won't be limited to the geospatial category. It's a safe bet that any of the following metadata fields are wrong too, if they exist:
>> - title
>> - summary
>> - history
>> - source (if the data was transformed in any way other than subsetting)
>> - processing_level (if transformations are involved)
>> - creator (and related)
>> - publisher (and related)
>> - license
>> - metadata_link
>> - time_coverage_resolution (this is a marker of intent, not necessarily derivable by examination of the file)
>> - uuid
>> - id
>> - naming_authority
>> While you're manually working with the selection of files that's maybe not a big deal, but at scale your collection of data will be a mess.
>>
>> In the end, I don't care which category the geospatial parameters end up in -- they may be computationally expensive to recalculate, but that's affordable at least.
>>
>> But I care a lot that we don't push back more on the real problem -- bad metadata that's created by broken software -- because it makes data reuse impossible to scale.
>>
>> John
>
>
>
> --
> Dr. Richard P. Signell (508) 457-2229
> USGS, 384 Woods Hole Rd.
> Woods Hole, MA 02543-1598
John Graybeal
jbgraybeal at mindspring.com
More information about the Esip-documentation
mailing list