[Esip-discovery] Datacasting custom elements proposal

Mccleese, Sean W (388A) sean.w.mccleese at jpl.nasa.gov
Fri Mar 18 13:36:09 EDT 2011


To address some of the things mentioned in the last couple emails:

The primary difference between <datacasting:customElement="maxWindSpeed"> and <datacasting:maxWindSpeed> is that the latter requires datacasting:maxWindSpeed to be declared in an XML schema and the former does not. Declaring something in a schema is a fairly heavyweight operation, especially because most schema definition languages can be fairly opaque. Not only that, but you have to make the schema available independently via a long-lived URL. It also means that every client has to read the schema in order to determine simple things like data type. With the attribute method, quick-and-dirty clients (e.g. python scripts, etc) could ignore validation entirely, just assume the XML is correct, and extract data.  This is not possible using the schema method because data type must be buried in the schema, meaning parsing of it is mandatory to do anything general with the data.  In the keyword case, you can ignore the schema.  While that's not recommended for production clients, we should not ignore the one-off script kinds of uses. 

Furthermore, in a lot of situations errors/mismatches in the schema will be reported by some deeply-embedded part of the XML parsing stack which is likely to make it harder for clients to parse & present these errors in an intelligible way to the user(s). Basically this would turn custom element metadata errors into structural errors rather than data errors. This whole thing basically boils down to the question as to whether custom metadata is a "data" issue or a "structural" issue.

One thing worth nothing is that, as of this moment, the Datacasting team under Andrew Bingham has the attribute method working and implemented in the Datacasting Feed Reader (http://datacasting.jpl.nasa.gov). What we have discovered through customer use cases is that the custom metadata is of prime importance to users (as is somewhat predictable) but the burden on data providers to create conforming custom metadata is fairly high. We have seen data providers struggle with the implementation of custom metadata even in the "simpler" case of the attribute lists -- and if they are required to auto-generate XML schema and update those schema as custom metadata is added/removed it will likely further encumber the process of custom metadata injection.

I would contend that self-describing the metadata within the RSS/Atom's channel metadata and utilizing these definitions through the tag attributes we alleviate all of these problems while maintaining the required functionality. It even fits with established Atom concepts (e.g. the "link" tag). 

Basically, I think schema should be relatively static documents describing the structure of the file, rather than highly dynamic documents that change with every revision of every feed.

If we do decide to go with the schema method, there must be a namespace dedicated ONLY to custom elements - nothing else.  That allows predefined structural things like datacasting:dataSource to be added in the future, which is not possible if you simply reserve some names out of the namespace and then open it up to the world.  By the same token, the schema method really requires each feed or set of related feeds to have its own namespace, to avoid any possibility of collision... because collisions have structural implications (e.g. the data type might be different). Chris has asked to see some data on collision probability and I think given the schema implications that's an excellent suggestion. 

Furthermore, only one group can "own" a schema at a time, so you cannot have multiple groups sharing one schema as they would have to do a lot of coordination for updates.  With the keyword method, collisions could occur but not a problem because the metadata is already isolated within the channel.  

-Sean

-----Original Message-----
From: Mattmann, Chris A (388J) 
Sent: Friday, March 18, 2011 6:41 AM
To: Hua, Hook (388C)
Cc: Mccleese, Sean W (388A); esip-discovery at lists.esipfed.org
Subject: Re: [Esip-discovery] Datacasting custom elements proposal

Hi Hook,

On Mar 17, 2011, at 11:34 PM, Hua, Hook (388C) wrote:

> Hi Sean and Chris,
> 
> (1) I recall we had a similar discussion back in the Federated OpenSearch Cluster about whether we should assume that the clients are namespace aware. There could be some readers that are badly written (e.g. not using formal XML parsers) and therefore not compliant. All we can do on the server side is to do the right thing and always use namespaces.

+1.

> (2) Sean, your points on custom tags could also be equally valid for OpenSearch and ServiceCasting responses as well. Your example custom tag for dataSource could also be a custom tag for the other Discovery responses.
> 
> So should we generalize your proposal across the other Discovery services too? If so, then we probably shouldn't use a "datacasting" namespace as in:
> 
> <datacasting:dataSource>foo</datacasting:dataSource>
> 
> But for the sake of making progress, I could see the need to concentrate on one service type at a time. But then again, there are overlaps.

I'd say it might be good to consider some following research questions and to actually generate some #s behind them before getting far down any path. I've heard a lot of concern regarding collision. Does anyone know:

1. what the frequency % of collision is in at least some X use cases? IOW, how often does this happen? I have my own views on this BTW but I'd rather just see some real data points.
2. what are the top Y feed readers we're targeting? I would hope the answer is not Y = all of them. Downstream users and consumers can always think of ways to break "standards" and rather than design for those cases, I think resources and effort are best spent designing for the actual ones first (starting small then growing big).

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann at nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
Phone: +1 (818) 354-8810
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



More information about the Esip-discovery mailing list