[Esip-discovery] Datacasting custom elements proposal

Fri Mar 18 10:29:48 EDT 2011

Hi Chris,

>
> I tend to agree with the school of thought that constructs custom tags as elements:
> (1) validation by schema is especially important for custom tags, esp. since documentation of the custom tags is likely to be more uneven than that for widely accepted standard tags

Yep.

> (2) I think tag collision, while a possibility, does not rise to the level of a serious problem for the following reasons.  I think we can mitigate against tag collision through other means:
>  (a) Define a Best Practice that all ESIP Discovery-related tags in an ESIP Discovery response be qualified with explicit namespaces.  If custom tags are now put into the default namespace, any aggregator will easily distinguish them.
>  (b) Develop compliance checkers for Discovery responses that look for tag collisions (using some basic heuristics) and offer warnings where they might exist.  Note that the ability to validate the contents against a validation schema is one of the heuristics that can go into them.
>  (c) Provide a feed construction API that follows (i) and validates in (ii)

Yep, and I would also add:

(d) (as a downstream consumer) Be permissive of upstream ESIP Discovery responses that include custom tags. IOW, just do like Firefox or any of the other mass feed readers and aggregators out there -- if there's something it doesn't know what to do with (a custom tag; a custom attribute, etc.), simply move on (decide whether or not to), flag it, and then keep processing the rest of the doc. I can arbitrary add any ol' random RSS/Feed/RDF tag to a channel/item stream and Firefox will still pick out what it's interested in and display it as a Live bookmark. This type of resiliency is the type of behavior I consider to be a best practice.

>
> Granted, a feed developer could still thwart this by using the explicit namespace and misusing the tag, or by continuing to put everything in the default namespace, but such a developer can probably also come up with a way of misusing attributes as well 8-).  At some point, we have to trust the developers (who at this point are mostly...us) to do the right thing.

Sure, agreed.

Cheers,
Chris

>
>>
>> ________________________________________
>> From: esip-discovery-bounces at lists.esipfed.org [esip-discovery-bounces at lists.esipfed.org] On Behalf Of Mattmann, Chris A (388J) [chris.a.mattmann at jpl.nasa.gov]
>> Sent: Thursday, March 17, 2011 9:43 PM
>> To: Mccleese, Sean W (388A)
>> Cc: esip-discovery at lists.esipfed.org
>> Subject: Re: [Esip-discovery] Datacasting custom elements proposal
>>
>> Hi Sean,
>>
>>> Sorry it took me a while to reply!
>>
>> No problemo!
>>
>>>
>>> In regards to the tag collision aspect I keep bringing up, I think I'm being a little vague as to my concerns. I'm imagining the following scenario:
>>>
>>> Let's say there is a tag "datacasting:dataSource" which is pre-defined by the spec to represent the originating data provider. This seems like a plausible scenario, as the "datacasting" namespace would want to define some commonly used *casting attributes, such as data provider, acquisition time, etc. Within a particular feed, a data provider could also define a custom element "dataSource" to mean something else entirely. Therefore, a client application written to interpret "dataSource" would assume it meant the spec's definition which would potentially result in unpredictable behavior. Now, obviously the feed provider isn't following the spec in this case, but I think that's a possible situation that should be designed to inhibit.
>>
>> Why would it be a problem if the tags were namespaced? I.e., if the feedreader leveraged the understanding of namespaces, then:
>>
>> <item>
>> <datacasting:dataSource>foo</datacasting:dataSource>
>> <chriscoolnamespace:dataSource>bar</chriscoolnamespace:dataSource>
>> ...
>> </item>
>>
>> Would be different things?
>>
>>>
>>> Limiting custom elements attributes within the "datacasting:customElement" tag would make it significantly more unlikely that such collisions would occur.
>>
>> I'm not so sure about that -- it entirely has to do with whether the feedreader is namespace aware.
>>
>>> Similarly, the channel-level definitions for custom elements allow us to define unit measurements as well as datatype. Those *could* be defined on a per-tag basis, but that would clutter up the XML and add some serious headaches for implementers with differing datatypes (or altogether missing ones).
>>
>> Can you show an example of this?
>>
>>>
>>>
>>> An additional wrinkle in the tag vs. attribute conversation is one of run-time foreknowledge of custom element existence. The Datacasting Feed Reader we have implemented allows for users to "filter" feeds based on tag metadata. So, for example, I could filter a feed based on "dataSource='NODC'" and the Feed Reader would then display only items/entries within the selected feed whose "dataSource" tag equaled "NODC".
>>
>> Gotcha. You could also do: "ns:dataSource=NODC" where ns is a namespace, and then that would take care of it I think. Check out this RFC for a feed query language that might be worth leveraging:
>>
>> http://tools.ietf.org/html/draft-nottingham-atompub-fiql-00
>>
>>>
>>> However, we can easily imagine some metadata tags that may not be present within all items (ex. When some physical phenomenon is not measured). I can imagine a feed that updates frequently where there are some metadata tags that are only present in a small fraction of items. In that case, there could be times where the entire presented RSS .xml file does not contain any instances of the tag in question. That would preclude the Feed Reader (or any feed reader I can imagine) from providing filtering support for that tag, as the feed reader wouldn't be aware of the existence of that tag at all.
>>
>> Aren't feed queries directed by the user, which directs the feed reader? IOW, isn't the user the one that queries for the particular tag? I'm not sure how using <ns:tagName> versus <datacasting:element name="tag">..</...> has any impact on that.
>>
>>>
>>> This obviously leads to the reasoning behind the channel-level custom element tag definitions, as a feed reader should interpret those as a comprehensive list of all possible custom elements that may appear in the feed. The current proposed spec has the "datacasting:customEltDef" channel-level tag with attributes linking it to the item-level "datacasting:customElement" tag. We could, I suppose, continue to use the "datacasting:customEltDef" with attributes that link to the tag-level custom names (like you example of <datacasting:windSpeed>), however I think it's a bit more "graceful" (for lack of a better word) to have both channel & tag-level custom elements define their mapping through the attributes.
>>>
>>> Long-winded response, I know, but I think this is a topic that merits discussion for sure.
>>
>> For sure. Thanks for your response!
>>
>> Cheers,
>> Chris
>>
>>>
>>> Thanks!
>>> -Sean
>>>
>>> -----Original Message-----
>>> From: Mattmann, Chris A (388J)
>>> Sent: Monday, March 14, 2011 10:26 PM
>>> To: Mccleese, Sean W (388A)
>>> Cc: esip-discovery at lists.esipfed.org
>>> Subject: Re: [Esip-discovery] Datacasting custom elements proposal
>>>
>>> Hi Sean,
>>>
>>>> We used ROME for our implementation of our Datacasting client as well, which is how we're handling the custom elements at this point.
>>>
>>> Gotcha, thanks.
>>>
>>>>
>>>> We definitely could namespace all the tags (datacasting:maxWindSpeed, etc), howver in some cases we may want "reserved" namespace'd tags (i.e. datacasting:guid or datacasting:datacenter, etc) and we would have to carefully deal with collision situations there, whereas using attributes completely eliminates that potential pitfall.
>>>
>>> I'm not seeing this as a problem? So long as you namespace the tags, it wouldn't matter if you had datacasting:guid, as well as chris:guid, so long as xmlns:datacasting, and xmlns:chris were defined, no? That's the whole point of XML namespacing and schema, to tackle the namespace issue.
>>>
>>> Attributes may obviate the need to deal with xmlns and Schema, but it adds another element to e.g., validation (can't use schema).
>>>
>>> That said, I say this being a guy that uses attributes a ton in a lot of the XML I write and standards that I participate in :)
>>>
>>>> We figured that the "custom element" concept is a tag-level concept whereas the name/value/etc is more of a descriptor for that type, similar to the "link" tag in Atom.
>>>
>>> Cool. OK.
>>>
>>>>
>>>> In regards to the specification question: You might be right -- that may not be an actual requirement of the specification. I may have misspoken on that. Every RSS/Atom reader I've used, though, as a best practice does ignore tags that are unrecognized. I guess that's what I was trying to say. Sorry for the mis-application of specifications!
>>>
>>> No worries! I'm not an expert for sure, and was more trying to figure out the design choices and rationale behind them.
>>>
>>> Thanks for taking the time to answer my questions!
>>>
>>> Cheers,
>>> Chris
>>>
>>>>
>>>> -Sean
>>>>
>>>> ________________________________________
>>>> From: Mattmann, Chris A (388J)
>>>> Sent: Monday, March 14, 2011 7:03 PM
>>>> To: Mccleese, Sean W (388A)
>>>> Cc: esip-discovery at lists.esipfed.org
>>>> Subject: Re: [Esip-discovery] Datacasting custom elements proposal
>>>>
>>>> Hey Sean,
>>>>
>>>>>
>>>>> Great question!
>>>>
>>>> Thanks!
>>>>
>>>>>
>>>>> The reason we went with the element name in the tag attributes instead of the tag name itself is that a lot of RSS/Atom parsers will look for "understood" tags by their tag name and, in finding one that's unrecognized, will simply move on to the next tag.
>>>>
>>>> Can you name a few examples? I'd be interested in trying some out. I'm most familiar with the Java ROME API, but also have used commons-feedparser in the past (no longer maintained, but was pretty robust back when I wrote an RSS parser for Nutch in 2005).
>>>>
>>>>> We didn't want the "datacasting" namespace to, as a default action, encompass all possible tag names (with possible conflicts, etc, arising from that).
>>>>
>>>> Couldn't you just namespace the other tags? In other words, it would be perfectly acceptable to have:
>>>>
>>>> <?xml version="1.0"
>>>> xmlns:dc="...dublin core uri..."
>>>> xmlns:datacasting="...data casting uri..."
>>>> ?>
>>>>
>>>> <item>
>>>> <dc:Title>My FOO RSS Item</dc:Title>
>>>> <datacasting:maxWinSpeed>81.2</datacasting:maxWinSpeed>
>>>> <foo:barElem xmlns:foo="blah blah uri">Some FOO value</foo:bar>
>>>>
>>>> ..
>>>> </item>
>>>>
>>>> Right?
>>>>
>>>>>
>>>>> By putting it in the attributes for the tag itself, we can override the "ignore if unrecognized" requirement for the RSS spec, which is where the channel-level definitions come into play. That also helps with the datatype / units declarations as well.
>>>>
>>>> What version of the RSS spec has this? I'm familiar with 2.0, is that the version you're talking about?
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>>>
>>>>>
>>>>> On Mar 14, 2011, at 6:10 PM, "Mattmann, Chris A (388J)" <chris.a.mattmann at jpl.nasa.gov> wrote:
>>>>>
>>>>>> Hi Sean,
>>>>>>
>>>>>> Quick question:
>>>>>>
>>>>>> Why not just declare:
>>>>>>
>>>>>> <item>...
>>>>>> <datacasting:maxWinSpeed>81.2</datacasting:maxWinSpeed>
>>>>>> </item>
>>>>>>
>>>>>> Cheers,
>>>>>> Chris
>>>>>>
>>>>>> On Mar 14, 2011, at 2:19 PM, Mccleese, Sean W (388A) wrote:
>>>>>>
>>>>>>> As part of Andrew Bingham's Datacasting team, I would like to put forth a proposal for DCP-2 (or beyond) to add support for custom elements in Datacasting (or *casting) RSS & Atom feeds.
>>>>>>>
>>>>>>> The purpose of "custom elements" to enable data providers to define and use domain-specific RSS/Atom elements from within the feed itself without foreknowledge on the part of the user or RSS/Atom viewer. This would enable, for example, a data provider of Datacasting RSS feeds tracking hurricanes to define "Max Wind Speed" as a valid RSS/Atom item element and then use that element within the feed. This would be accomplished through a two-step process: channel-level definitions and item-level use.
>>>>>>>
>>>>>>> A channel level definition would define the name of the element, its core data type and then units the element represents. Ex:
>>>>>>> <datacasting:customEltDef name="maxWindSpeed" type="float" units="mph" />
>>>>>>>
>>>>>>> This would inform the feed reader that the item "maxWindSpeed" is present within the feed with a floating point number data type and that the physical representation of the data is in MPH. Then, in order for this data type to appear within the feed's item-level information, the following example is illustrative:
>>>>>>>
>>>>>>> (for RSS):
>>>>>>> <item> ....
>>>>>>> <datacasting:customElement name="maxWindSpeed" value="81.2"/>
>>>>>>> </item>
>>>>>>>
>>>>>>> (for Atom):
>>>>>>> <entry>
>>>>>>> ...
>>>>>>> <datacasting:customElement name="maxWindSpeed" value="81.2"/>
>>>>>>> </entry>
>>>>>>>
>>>>>>> Thus, when viewing the item/entry in the appropriate feed reader, the "maxWindSpeed" element would be displayed along with all the other relevant metadata (e.g. description, author, etc).
>>>>>>>
>>>>>>> I'll add this to the Discovery Change Proposals page, but I wanted to make sure to inform the list about this proposed idea. If you're interested in seeing a working representation of this in action, check out the GHRSST AMSR-E DatacastingRSS feed (http://ghrsst.jpl.nasa.gov/datacasting/AMSRE-L2P-gen.xml) which can be read with the Datacasting Feed Reader (http://datacasting.jpl.nasa.gov)
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -Sean McClese
>>>>>>> _______________________________________________
>>>>>>> Esip-discovery mailing list
>>>>>>> Esip-discovery at lists.esipfed.org
>>>>>>> http://www.lists.esipfed.org/mailman/listinfo/esip-discovery
>>>>>>
>>>>>>
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Chris Mattmann, Ph.D.
>>>>>> Senior Computer Scientist
>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>> Email: chris.a.mattmann at nasa.gov
>>>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>>>> Phone: +1 (818) 354-8810
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>
>>>>
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: chris.a.mattmann at nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> Phone: +1 (818) 354-8810
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattmann at nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> Phone: +1 (818) 354-8810
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann at nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> Phone: +1 (818) 354-8810
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>> _______________________________________________
>> Esip-discovery mailing list
>> Esip-discovery at lists.esipfed.org
>> http://www.lists.esipfed.org/mailman/listinfo/esip-discovery
>> _______________________________________________
>> Esip-discovery mailing list
>> Esip-discovery at lists.esipfed.org
>> http://www.lists.esipfed.org/mailman/listinfo/esip-discovery
>
> Christopher Lynnes
> Goddard Earth Sciences Data and Information Center, NASA/GSFC
> 301-614-5185
>
> _______________________________________________
> Esip-discovery mailing list
> Esip-discovery at lists.esipfed.org
> http://www.lists.esipfed.org/mailman/listinfo/esip-discovery

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann at nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
Phone: +1 (818) 354-8810
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++