[Esip-documentation] cdm_data_type: summary and minimalist proposal

Nan Galbraith via Esip-documentation esip-documentation at lists.esipfed.org
Tue Dec 9 14:24:39 EST 2014


Given that it's a questionable practice to include fields containing commas
in a csv list, do we really want to specify how it should be done? And, 
in the
process, make our specification longer than necessary?  This seems to me to
be outside the purpose of ACDD.

Enough said, I'm sure. If you all feel there's a need for this, I'll go 
along
with it, if reluctantly.

Thanks -
Nan

On 12/9/14 2:04 PM, Bob Simons - NOAA Federal via Esip-documentation wrote:
>
> On 2014-12-09 10:47 AM, Nan Galbraith via Esip-documentation wrote:
>> Aha, thanks Bob (and John), that wasn't clear to me - the entities are
>> within the list, and so are the double quotes. That sounds like a 
>> somewhat
>> rare scenario; are we sure there's support for that in all the NetCDF
>> utilities? I suspect Matlab might have a little problem with it, but
>> haven't tested it - either it hasn't been discussed on this list or I
>> completely missed it.
>>
>> If we're sure we're not introducing a feature that's not supported by 
>> the
>> NetCDF libraries, maybe an example would be helpful - I was definitely
>> confused by this text.
> I think there is general support for this approach since it has a 
> precedent in csv files. And if a given piece of software doesn't 
> support it, then hopefully someone will file a bug report. "Escaping" 
> is widely used because this general problem pops up over and over 
> again in different situations.
>
> And I think people make an effort to avoid the problem, e.g., they 
> don't make keywords that have internal commas.
>
>>
>>
>> Thanks -
>> Nan
>>
>>
>> On 12/9/14 1:12 PM, Bob Simons - NOAA Federal via Esip-documentation 
>> wrote:
>>> It may be confusing, but I think it is correct.
>>> Perhaps we need an example, e.g.,
>>> entity1, entity2, "entity3, with internal comma", entity 4
>>>
>>> 2014-12-09 10:06 AM, Nan Galbraith via Esip-documentation wrote:
>>>> Do you want input via email, on the 'talk' page, or on the Active 
>>>> Issues page?
>>>>
>>>> I just noticed something else on the main (draft) page that I think 
>>>> should be
>>>> changed:
>>>>
>>>>     Several attributes explicitly allow the entry of multiple
>>>>     entities as comma-separated
>>>>     values. The entities in such lists which contain a comma must be
>>>>     enclosed in straight
>>>>     double quotation marks ("), which will not be considered part of
>>>>     the entity.
>>>>
>>>> I'm not sure if this is correct, but it's certainly confusing; 
>>>> netcdf inserts double quotes
>>>> (e.g. in ncdump) for non-numeric, and we seem to be advising people 
>>>> to add a pair
>>>> on their own. I think we're just getting too far into the weeds 
>>>> with this level of detail.
>>>>
>>>> I recommend we drip the second sentence, or the whole paragraph on 
>>>> comma-separated
>>>> lists.
>>>>
>>>> Nan
>>>>
>>>>
>>>>
>>>>
>>>> On 12/9/14 12:32 PM, John Graybeal via Esip-documentation wrote:
>>>>> As previously promised, this email summarizes the status of the 
>>>>> cdm_data_type attribute.
>>>>>
>>>>> To start a discussion, I propose the removal of specific 
>>>>> references and code lists, to produce the following definition:
>>>>>
>>>>> "The organization of the data, as derived from the Common Data 
>>>>> Model's Scientific Data layer and understood by THREDDS. (This is 
>>>>> a THREDDS "dataType", and is different from the CF NetCDF 
>>>>> attribute 'featureType', which indicates a Discrete Sampling 
>>>>> Geometry file in CF.)"
>>>>>
>>>>> Below is background material, if you want to understand the 
>>>>> problem and the details.
>>>>>
>>>>> John
>>>>>
>>>>> *The Problem*
>>>>>
>>>>> At a minimum, the current 1.3 definition isn't perfect. Its list 
>>>>> of valid values (point, /profile/, /section/, station, 
>>>>> /station_profile/, trajectory, grid, image, or swath) doesn't 
>>>>> agree with the referenced list (point, station, trajectory, grid, 
>>>>> image, swath, /radial/). And the NODC guidance [3] is different as 
>>>>> well, saying "The current choices are: Grid, Image, Station, 
>>>>> Swath, and Trajectory."
>>>>>
>>>>> Additional concerns are expressed in Bob Simons' email of 10/16, 
>>>>> as follows:
>>>>> 1) For cdm_data_type, it is unfortunate that previous versions of 
>>>>> ACDD included a link to a specific list. Surely the intention is 
>>>>> to evolve as Unidata/THREDDS/the common data model evolves, even 
>>>>> if that particular list doesn't evolve. Can we please remove that 
>>>>> link and add the values from the CF DSG chapter that aren't in the 
>>>>> current list here: timeSeries, timeSeriesProfile, trajectoryProfile?
>>>>> 2) And please remove the NODC guidance link. That is NODC guidance 
>>>>> about the DSG variants that NODC prefers and is not strictly 
>>>>> relevant to a list of cdm data types.
>>>>>
>>>>> *History/References*
>>>>>
>>>>> In version 1.1, the definition for this attribute was "The 
>>>>> *THREDDS data type* appropriate for this data set.", with the 
>>>>> bolded text referencing 
>>>>> http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/InvCatalogSpec.html#dataType
>>>>>
>>>>> In making version 1.3, we removed cdm_data_type, then re-added it 
>>>>> for backward compatibility. The following definition was proposed 
>>>>> and yet survives: "The organization of the data, as derived from 
>>>>> the Common Data Model's Scientific Data layer and understood by 
>>>>> THREDDS (this is a *THREDDS "dataType"*). One of point, profile, 
>>>>> section, station, station_profile, trajectory, grid, image, or 
>>>>> swath. Please note that this is different from the CF NetCDF 
>>>>> attribute 'featureType' that indicates a Discrete Sampling 
>>>>> Geometry file---for guidance on those terms, please see the *NODC 
>>>>> guidance*." The first bold text points to the same address as in 
>>>>> v1.1; the second points to 
>>>>> http://www.nodc.noaa.gov/data/formats/netcdf/ [3].
>>>>>
>>>>> A detailed review of the discussion through Oct 6 starts at line 
>>>>> 15 of this Active Issues page 
>>>>> <https://docs.google.com/spreadsheets/d/19fl5AgGkckG03yTchUjYUp4YnR09Fn1Nqps2KHenkC4/edit#gid=0> 
>>>>> [5]. That latest proposal was to keep v1.3 cdm_data_type as is, 
>>>>> and not deprecate it, as there are some examples of its utility.
>>>>>
>>>>> A nice review of the current NetCDF-Java code usage of 
>>>>> cdm_data_type is here 
>>>>> <https://github.com/Unidata/thredds/issues/72> [1]. The upshot is 
>>>>> that if featureType is present, it is sufficient for that library; 
>>>>> but older applications may still require cdm_data_type. This 
>>>>> thread also points out an issue in the NODC guidance page at 
>>>>> http://www.nodc.noaa.gov/data/formats/netcdf/v1.1/ [4]. (Reference 
>>>>> [3] resolves to this). Another more recent, and arguably more 
>>>>> relevant, code citation is at Unidata's THREDDS code: 
>>>>> https://github.com/Unidata/thredds/blob/target-4.3.22/cdm/src/main/java/ucar/nc2/constants/FeatureType.java 
>>>>> [6]; it has a still longer list of items.
>>>>>
>>>>> *Options*
>>>>>
>>>>> These options are mix-and-match.
>>>>>
>>>>> The default option is doing nothing. Other conceivable options are:
>>>>> - Revert to previous wording from 1.1.
>>>>> - For the NODC issue: Just remove the NODC guidance link phrase 
>>>>> ("---for guidance on those terms, please see the NODC guidance").
>>>>> - For the inconsistency in the list of acceptable terms:
>>>>> A) Remove the link to *THREDDS "dataType"* (and let users figure 
>>>>> things out for themselves, or add more terms such as listed under 
>>>>> (1) above).
>>>>> B) Change the list of terms to match the current link to *THREDDS 
>>>>> "dataType"*.
>>>>> C) Change the link to *THREDDS "dataType"* and update the list of 
>>>>> terms to match whatever we point to.
>>>>> - For the general confusion about what this term is for, add 
>>>>> something like this sentence for context: "This attribute is 
>>>>> maintained for compliance with older files and applications, and 
>>>>> is neither needed nor recommended for most purposes (use 
>>>>> featureType instead)."
>>>>>
>>>>>
>>>>> *References*
>>>>> *
>>>>> *
>>>>> [1] THREDDS issue discussion of cdm_data_type: 
>>>>> https://github.com/Unidata/thredds/issues/72
>>>>> [2] THREDDS data type reference in 1.1 and 1.3: 
>>>>> http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/InvCatalogSpec.html#dataType
>>>>> [3] NODC guidance reference in 1.3: 
>>>>> http://www.nodc.noaa.gov/data/formats/netcdf/ (forwards to [4])
>>>>> [4] NODC guidance referenced by @shane-axiom: 
>>>>> http://www.nodc.noaa.gov/data/formats/netcdf/v1.1/
>>>>> [5] ACDD 1.3 Reconciliation Pages: Active Issues: 
>>>>> https://docs.google.com/spreadsheets/d/19fl5AgGkckG03yTchUjYUp4YnR09Fn1Nqps2KHenkC4/edit#gid=0
>>>>> [6] Unidata THREDDS code with a list of feature types: 
>>>>> https://github.com/Unidata/thredds/blob/target-4.3.22/cdm/src/main/java/ucar/nc2/constants/FeatureType.java
>>>>>

-- 
*******************************************************
* Nan Galbraith        Information Systems Specialist *
* Upper Ocean Processes Group            Mail Stop 29 *
* Woods Hole Oceanographic Institution                *
* Woods Hole, MA 02543                 (508) 289-2444 *
*******************************************************


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20141209/adaef9a8/attachment.html>


More information about the Esip-documentation mailing list