[Esip-discovery] Datacasting custom elements proposal
Mccleese, Sean W (388A)
sean.w.mccleese at jpl.nasa.gov
Thu Mar 17 17:00:59 EDT 2011
Sorry it took me a while to reply!
In regards to the tag collision aspect I keep bringing up, I think I'm being a little vague as to my concerns. I'm imagining the following scenario:
Let's say there is a tag "datacasting:dataSource" which is pre-defined by the spec to represent the originating data provider. This seems like a plausible scenario, as the "datacasting" namespace would want to define some commonly used *casting attributes, such as data provider, acquisition time, etc. Within a particular feed, a data provider could also define a custom element "dataSource" to mean something else entirely. Therefore, a client application written to interpret "dataSource" would assume it meant the spec's definition which would potentially result in unpredictable behavior. Now, obviously the feed provider isn't following the spec in this case, but I think that's a possible situation that should be designed to inhibit.
Limiting custom elements attributes within the "datacasting:customElement" tag would make it significantly more unlikely that such collisions would occur. Similarly, the channel-level definitions for custom elements allow us to define unit measurements as well as datatype. Those *could* be defined on a per-tag basis, but that would clutter up the XML and add some serious headaches for implementers with differing datatypes (or altogether missing ones).
I definitely see your point about validation against attributes, though it's sort of a tradeoff:
-If we use raw tag names as defacto declarations of custom elements and simply add datatype/unit stuff to each tag (or even through some other channel-level mechanism), it is hard/impossible to verify the collision/datatype consistency/etc aspects.
-Whereas if we use the "datacasting:customElement" tag with attributes defining the custom element's meaning, it becomes harder/impossible to use validation to verify the matchings between custom element channel-level definitions and per-item/entry custom element usage, but we obviate the collision stuff.
An additional wrinkle in the tag vs. attribute conversation is one of run-time foreknowledge of custom element existence. The Datacasting Feed Reader we have implemented allows for users to "filter" feeds based on tag metadata. So, for example, I could filter a feed based on "dataSource='NODC'" and the Feed Reader would then display only items/entries within the selected feed whose "dataSource" tag equaled "NODC".
However, we can easily imagine some metadata tags that may not be present within all items (ex. When some physical phenomenon is not measured). I can imagine a feed that updates frequently where there are some metadata tags that are only present in a small fraction of items. In that case, there could be times where the entire presented RSS .xml file does not contain any instances of the tag in question. That would preclude the Feed Reader (or any feed reader I can imagine) from providing filtering support for that tag, as the feed reader wouldn't be aware of the existence of that tag at all.
This obviously leads to the reasoning behind the channel-level custom element tag definitions, as a feed reader should interpret those as a comprehensive list of all possible custom elements that may appear in the feed. The current proposed spec has the "datacasting:customEltDef" channel-level tag with attributes linking it to the item-level "datacasting:customElement" tag. We could, I suppose, continue to use the "datacasting:customEltDef" with attributes that link to the tag-level custom names (like you example of <datacasting:windSpeed>), however I think it's a bit more "graceful" (for lack of a better word) to have both channel & tag-level custom elements define their mapping through the attributes.
Long-winded response, I know, but I think this is a topic that merits discussion for sure.
From: Mattmann, Chris A (388J)
Sent: Monday, March 14, 2011 10:26 PM
To: Mccleese, Sean W (388A)
Cc: esip-discovery at lists.esipfed.org
Subject: Re: [Esip-discovery] Datacasting custom elements proposal
> We used ROME for our implementation of our Datacasting client as well, which is how we're handling the custom elements at this point.
> We definitely could namespace all the tags (datacasting:maxWindSpeed, etc), howver in some cases we may want "reserved" namespace'd tags (i.e. datacasting:guid or datacasting:datacenter, etc) and we would have to carefully deal with collision situations there, whereas using attributes completely eliminates that potential pitfall.
I'm not seeing this as a problem? So long as you namespace the tags, it wouldn't matter if you had datacasting:guid, as well as chris:guid, so long as xmlns:datacasting, and xmlns:chris were defined, no? That's the whole point of XML namespacing and schema, to tackle the namespace issue.
Attributes may obviate the need to deal with xmlns and Schema, but it adds another element to e.g., validation (can't use schema).
That said, I say this being a guy that uses attributes a ton in a lot of the XML I write and standards that I participate in :)
> We figured that the "custom element" concept is a tag-level concept whereas the name/value/etc is more of a descriptor for that type, similar to the "link" tag in Atom.
> In regards to the specification question: You might be right -- that may not be an actual requirement of the specification. I may have misspoken on that. Every RSS/Atom reader I've used, though, as a best practice does ignore tags that are unrecognized. I guess that's what I was trying to say. Sorry for the mis-application of specifications!
No worries! I'm not an expert for sure, and was more trying to figure out the design choices and rationale behind them.
Thanks for taking the time to answer my questions!
> From: Mattmann, Chris A (388J)
> Sent: Monday, March 14, 2011 7:03 PM
> To: Mccleese, Sean W (388A)
> Cc: esip-discovery at lists.esipfed.org
> Subject: Re: [Esip-discovery] Datacasting custom elements proposal
> Hey Sean,
>> Great question!
>> The reason we went with the element name in the tag attributes instead of the tag name itself is that a lot of RSS/Atom parsers will look for "understood" tags by their tag name and, in finding one that's unrecognized, will simply move on to the next tag.
> Can you name a few examples? I'd be interested in trying some out. I'm most familiar with the Java ROME API, but also have used commons-feedparser in the past (no longer maintained, but was pretty robust back when I wrote an RSS parser for Nutch in 2005).
>> We didn't want the "datacasting" namespace to, as a default action, encompass all possible tag names (with possible conflicts, etc, arising from that).
> Couldn't you just namespace the other tags? In other words, it would be perfectly acceptable to have:
> <?xml version="1.0"
> xmlns:dc="...dublin core uri..."
> xmlns:datacasting="...data casting uri..."
> <dc:Title>My FOO RSS Item</dc:Title>
> <foo:barElem xmlns:foo="blah blah uri">Some FOO value</foo:bar>
>> By putting it in the attributes for the tag itself, we can override the "ignore if unrecognized" requirement for the RSS spec, which is where the channel-level definitions come into play. That also helps with the datatype / units declarations as well.
> What version of the RSS spec has this? I'm familiar with 2.0, is that the version you're talking about?
>> On Mar 14, 2011, at 6:10 PM, "Mattmann, Chris A (388J)" <chris.a.mattmann at jpl.nasa.gov> wrote:
>>> Hi Sean,
>>> Quick question:
>>> Why not just declare:
>>> On Mar 14, 2011, at 2:19 PM, Mccleese, Sean W (388A) wrote:
>>>> As part of Andrew Bingham's Datacasting team, I would like to put forth a proposal for DCP-2 (or beyond) to add support for custom elements in Datacasting (or *casting) RSS & Atom feeds.
>>>> The purpose of "custom elements" to enable data providers to define and use domain-specific RSS/Atom elements from within the feed itself without foreknowledge on the part of the user or RSS/Atom viewer. This would enable, for example, a data provider of Datacasting RSS feeds tracking hurricanes to define "Max Wind Speed" as a valid RSS/Atom item element and then use that element within the feed. This would be accomplished through a two-step process: channel-level definitions and item-level use.
>>>> A channel level definition would define the name of the element, its core data type and then units the element represents. Ex:
>>>> <datacasting:customEltDef name="maxWindSpeed" type="float" units="mph" />
>>>> This would inform the feed reader that the item "maxWindSpeed" is present within the feed with a floating point number data type and that the physical representation of the data is in MPH. Then, in order for this data type to appear within the feed's item-level information, the following example is illustrative:
>>>> (for RSS):
>>>> <item> ....
>>>> <datacasting:customElement name="maxWindSpeed" value="81.2"/>
>>>> (for Atom):
>>>> <datacasting:customElement name="maxWindSpeed" value="81.2"/>
>>>> Thus, when viewing the item/entry in the appropriate feed reader, the "maxWindSpeed" element would be displayed along with all the other relevant metadata (e.g. description, author, etc).
>>>> I'll add this to the Discovery Change Proposals page, but I wanted to make sure to inform the list about this proposed idea. If you're interested in seeing a working representation of this in action, check out the GHRSST AMSR-E DatacastingRSS feed (http://ghrsst.jpl.nasa.gov/datacasting/AMSRE-L2P-gen.xml) which can be read with the Datacasting Feed Reader (http://datacasting.jpl.nasa.gov)
>>>> -Sean McClese
>>>> Esip-discovery mailing list
>>>> Esip-discovery at lists.esipfed.org
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattmann at nasa.gov
>>> WWW: http://sunset.usc.edu/~mattmann/
>>> Phone: +1 (818) 354-8810
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann at nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> Phone: +1 (818) 354-8810
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann at nasa.gov
Phone: +1 (818) 354-8810
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
More information about the Esip-discovery