[Esip-documentation] cdm_data_type: summary and minimalist proposal

John Graybeal via Esip-documentation esip-documentation at lists.esipfed.org
Tue Dec 9 13:37:33 EST 2014


Emailing to the group is best for new issues and discussion; if it gets in the weeds we'll move to another process.

This text section was driven by the desire to be clear about how to represent multi-entity fields consistently, so that computers could parse them.  It basically just says "CSV" formatting applies. We could say RFC 4180 applies if we want to be precise, but the idea CRLF might be embedded is pretty freaky. http://tools.ietf.org/html/rfc4180

If ncdump inserts double quotes one imagines it does it correctly, escaping existing double-quotes as necessary. If it doesn't then it will be broken when someone includes a double-quote in their string.

John



On Dec 9, 2014, at 10:06, Nan Galbraith via Esip-documentation <esip-documentation at lists.esipfed.org> wrote:

> Do you want input via email, on the 'talk' page, or on the Active Issues page?
> 
> I just noticed something else on the main (draft) page that I think should be 
> changed:
> 
> Several attributes explicitly allow the entry of multiple entities as comma-separated 
> values. The entities in such lists which contain a comma must be enclosed in straight 
> double quotation marks ("), which will not be considered part of the entity. 
> I'm not sure if this is correct, but it's certainly confusing;  netcdf inserts double quotes
> (e.g. in ncdump) for non-numeric, and we seem to be advising people to add a pair 
> on their own.  I think we're just getting too far into the weeds with this level of detail. 
> 
> I recommend we drip the second sentence, or the whole paragraph on comma-separated
> lists.
> 
> Nan
> 
> 
> 
> 
> On 12/9/14 12:32 PM, John Graybeal via Esip-documentation wrote:
>> As previously promised, this email summarizes the status of the cdm_data_type attribute. 
>> 
>> To start a discussion, I propose the removal of specific references and code lists, to produce the following definition:
>> 
>> "The organization of the data, as derived from the Common Data Model's Scientific Data layer and understood by THREDDS. (This is a THREDDS "dataType", and is different from the CF NetCDF attribute 'featureType', which indicates a Discrete Sampling Geometry file in CF.)"
>> 
>> Below is background material, if you want to understand the problem and the details. 
>> 
>> John
>> 
>> The Problem
>> 
>> At a minimum, the current 1.3 definition isn't perfect. Its list of valid values (point, profile, section, station, station_profile, trajectory, grid, image, or swath) doesn't agree with the referenced list (point, station, trajectory, grid, image, swath, radial). And the NODC guidance [3] is different as well, saying "The current choices are: Grid, Image, Station, Swath, and Trajectory."
>> 
>> Additional concerns are expressed in Bob Simons' email of 10/16, as follows:
>>   1) For cdm_data_type, it is unfortunate that previous versions of ACDD included a link to a specific list. Surely the intention is to evolve as Unidata/THREDDS/the common data model evolves, even if that particular list doesn't evolve.  Can we please remove that link and add the values from the CF DSG chapter that aren't in the current list here: timeSeries, timeSeriesProfile, trajectoryProfile?
>>   2) And please remove the NODC guidance link. That is NODC guidance about the DSG variants that NODC prefers and is not strictly relevant to a list of cdm data types.
>> 
>> History/References
>> 
>> In version 1.1, the definition for this attribute was "The THREDDS data type appropriate for this data set.", with the bolded text referencing http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/InvCatalogSpec.html#dataType
>> 
>> In making version 1.3, we removed cdm_data_type, then re-added it for backward compatibility. The following definition was proposed and yet survives: "The organization of the data, as derived from the Common Data Model's Scientific Data layer and understood by THREDDS (this is a THREDDS "dataType"). One of point, profile, section, station, station_profile, trajectory, grid, image, or swath. Please note that this is different from the CF NetCDF attribute 'featureType' that indicates a Discrete Sampling Geometry file—for guidance on those terms, please see the NODC guidance." The first bold text points to the same address as in v1.1; the second points to http://www.nodc.noaa.gov/data/formats/netcdf/ [3].
>> 
>> A detailed review of the discussion through Oct 6 starts at line 15 of this Active Issues page [5]. That latest proposal was to keep v1.3 cdm_data_type as is, and not deprecate it, as there are some examples of its utility.
>> 
>> A nice review of the current NetCDF-Java code usage of cdm_data_type is here [1]. The upshot is that if featureType is present, it is sufficient for that library; but older applications may still require cdm_data_type. This thread also points out an issue in the NODC guidance page at http://www.nodc.noaa.gov/data/formats/netcdf/v1.1/ [4]. (Reference [3] resolves to this). Another more recent, and arguably more relevant, code citation is at Unidata's THREDDS code: https://github.com/Unidata/thredds/blob/target-4.3.22/cdm/src/main/java/ucar/nc2/constants/FeatureType.java [6]; it has a still longer list of items.
>> 
>> Options
>> 
>> These options are mix-and-match.
>> 
>> The default option is doing nothing. Other conceivable options are:
>> - Revert to previous wording from 1.1.
>> - For the NODC issue: Just remove the NODC guidance link phrase ("—for guidance on those terms, please see                 the NODC guidance").
>> - For the inconsistency in the list of acceptable terms:
>>    A) Remove the link to THREDDS "dataType" (and let users figure things out for themselves, or add more terms such as listed under (1) above).
>>    B) Change the list of terms to match the current link to THREDDS "dataType".
>>    C) Change the link to THREDDS "dataType" and update the list of terms to match whatever we point to. 
>> - For the general confusion about what this term is for, add something like this sentence for context: "This attribute is maintained for compliance with older files and applications, and is neither needed nor recommended for most purposes (use featureType instead)."
>> 
>> 
>> References
>> 
>> [1] THREDDS issue discussion of cdm_data_type: https://github.com/Unidata/thredds/issues/72
>> [2] THREDDS data type reference in 1.1 and 1.3: http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/InvCatalogSpec.html#dataType
>> [3] NODC guidance reference in 1.3: http://www.nodc.noaa.gov/data/formats/netcdf/   (forwards to [4])
>> [4] NODC guidance referenced by @shane-axiom: http://www.nodc.noaa.gov/data/formats/netcdf/v1.1/
>> [5] ACDD 1.3 Reconciliation Pages: Active Issues: https://docs.google.com/spreadsheets/d/19fl5AgGkckG03yTchUjYUp4YnR09Fn1Nqps2KHenkC4/edit#gid=0
>> [6] Unidata THREDDS code with a list of feature types: https://github.com/Unidata/thredds/blob/target-4.3.22/cdm/src/main/java/ucar/nc2/constants/FeatureType.java
>> 
>> 
>> _______________________________________________
>> Esip-documentation mailing list
>> Esip-documentation at lists.esipfed.org
>> http://www.lists.esipfed.org/mailman/listinfo/esip-documentation
> 
> 
> -- 
> *******************************************************
> * Nan Galbraith        Information Systems Specialist *
> * Upper Ocean Processes Group            Mail Stop 29 *
> * Woods Hole Oceanographic Institution                *
> * Woods Hole, MA 02543                 (508) 289-2444 *
> *******************************************************
> 
> 
> _______________________________________________
> Esip-documentation mailing list
> Esip-documentation at lists.esipfed.org
> http://www.lists.esipfed.org/mailman/listinfo/esip-documentation

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20141209/76bb2dd1/attachment-0001.html>


More information about the Esip-documentation mailing list