[Esip-discovery] Datacasting custom elements proposal

Sat Mar 19 04:57:24 EDT 2011

Chris,

No problem! A great sample case is illustrated in one of the live Datacasting RSS feeds that has been implemented at PO.DAAC for GHRSST AMSR-E (http://ghrsst.jpl.nasa.gov/datacasting/AMSRE-L2P-gen.xml). 

Basically, the example problem scenario is: PO.DAAC is casting out data granules for GHRSST AMSR-E. And, obviously, these data granules  have metadata that is specific to that particular platform such as, for example, "Max. deviation from previous day SST". PO.DAAC defines this metadata within the RSS channel items in the following way:

<datacasting:customEltDef units="Kelvin" displayName="Max. deviation from previous day SST" type="float" name="MaxDT_Analysis"/>

and then, in the GHRSST AMSR-E feed, it's used in an item as follows:

<datacasting:customElement name="MinDT_Analysis" value="0.00000"/>

This enables a Feed Reader to parse the "MaxDT_Analysis" tag with knowledge of the data type (floating point, in this case) as well as present some information to the user about the physical quantity being measured. In the GHRSST AMSR-E feed, this is done along with 10 or so other "custom metadata elements". 

So the situation I'm proposing is one where each data provider may have metadata that is specific to the particular granules or data feeds they are casting out which are not known ahead of time by the particular Feed Reader or by ESIP. One can even envision a situation where Datacasting feeds are generated automatically as responses from OpenSearch or whatever and, in that case, the custom metadata may be specific to that particular query (though I would note this has never been implemented).

I do not think this *has* to be limited to datacasting, though for now I am only proposing it as a datacasting standard simply because that's how we've implemented it up until now.
________________________________________
From: Lynnes, Christopher S. (GSFC-6102) [christopher.s.lynnes at nasa.gov]
Sent: Friday, March 18, 2011 11:49 AM
To: Mccleese, Sean W (388A)
Cc: Mattmann, Chris A (388J); Hua, Hook (388C); esip-discovery at lists.esipfed.org
Subject: Re: [Esip-discovery] Datacasting custom elements proposal

Sean,
  Perhaps I need to take walk through a sample case to fully comprehend what you are proposing.  For instance, we are likely to propose a DayNightFlag as an ESIP extension to the main OpenSearch spec.  We also want to convey the fact that it can take only one of four values: "Day", "Night", "Day+Night", "N/A".
  If we extend OpenSearch the way the OpenSearch folks have been extending their standard, we would add elements (cf. the Response section of the OpenSearch Geo Extension).  i.e., <esipdiscovery:daynightflag>Day</esipdiscovery:daynightflag>. Then we would use <xsd:enumeration> elements in the schema to communicate the four valid values.
  How would that look in your attribute scheme?
  Or are you just suggesting the attribute method for custom elements that are *not* proposed in their own right as ESIP extensions?  That is, custom elements that are strictly application-dependent?
  Or are you suggesting this method only for datacasting and not for OpenSearch responses?

On Mar 18, 2011, at 1:36 PM, Mccleese, Sean W (388A) wrote:

> To address some of the things mentioned in the last couple emails:
>
> The primary difference between <datacasting:customElement="maxWindSpeed"> and <datacasting:maxWindSpeed> is that the latter requires datacasting:maxWindSpeed to be declared in an XML schema and the former does not. Declaring something in a schema is a fairly heavyweight operation, especially because most schema definition languages can be fairly opaque. Not only that, but you have to make the schema available independently via a long-lived URL. It also means that every client has to read the schema in order to determine simple things like data type. With the attribute method, quick-and-dirty clients (e.g. python scripts, etc) could ignore validation entirely, just assume the XML is correct, and extract data.  This is not possible using the schema method because data type must be buried in the schema, meaning parsing of it is mandatory to do anything general with the data.  In the keyword case, you can ignore the schema.  While that's not recommended for production clients, we
>  should not ignore the one-off script kinds of uses.
>
> Furthermore, in a lot of situations errors/mismatches in the schema will be reported by some deeply-embedded part of the XML parsing stack which is likely to make it harder for clients to parse & present these errors in an intelligible way to the user(s). Basically this would turn custom element metadata errors into structural errors rather than data errors. This whole thing basically boils down to the question as to whether custom metadata is a "data" issue or a "structural" issue.
>
> One thing worth nothing is that, as of this moment, the Datacasting team under Andrew Bingham has the attribute method working and implemented in the Datacasting Feed Reader (http://datacasting.jpl.nasa.gov). What we have discovered through customer use cases is that the custom metadata is of prime importance to users (as is somewhat predictable) but the burden on data providers to create conforming custom metadata is fairly high. We have seen data providers struggle with the implementation of custom metadata even in the "simpler" case of the attribute lists -- and if they are required to auto-generate XML schema and update those schema as custom metadata is added/removed it will likely further encumber the process of custom metadata injection.
>
> I would contend that self-describing the metadata within the RSS/Atom's channel metadata and utilizing these definitions through the tag attributes we alleviate all of these problems while maintaining the required functionality. It even fits with established Atom concepts (e.g. the "link" tag).
>
> Basically, I think schema should be relatively static documents describing the structure of the file, rather than highly dynamic documents that change with every revision of every feed.
>
> If we do decide to go with the schema method, there must be a namespace dedicated ONLY to custom elements - nothing else.  That allows predefined structural things like datacasting:dataSource to be added in the future, which is not possible if you simply reserve some names out of the namespace and then open it up to the world.  By the same token, the schema method really requires each feed or set of related feeds to have its own namespace, to avoid any possibility of collision... because collisions have structural implications (e.g. the data type might be different). Chris has asked to see some data on collision probability and I think given the schema implications that's an excellent suggestion.
>
> Furthermore, only one group can "own" a schema at a time, so you cannot have multiple groups sharing one schema as they would have to do a lot of coordination for updates.  With the keyword method, collisions could occur but not a problem because the metadata is already isolated within the channel.
>
> -Sean
>
> -----Original Message-----
> From: Mattmann, Chris A (388J)
> Sent: Friday, March 18, 2011 6:41 AM
> To: Hua, Hook (388C)
> Cc: Mccleese, Sean W (388A); esip-discovery at lists.esipfed.org
> Subject: Re: [Esip-discovery] Datacasting custom elements proposal
>
> Hi Hook,
>
> On Mar 17, 2011, at 11:34 PM, Hua, Hook (388C) wrote:
>
>> Hi Sean and Chris,
>>
>> (1) I recall we had a similar discussion back in the Federated OpenSearch Cluster about whether we should assume that the clients are namespace aware. There could be some readers that are badly written (e.g. not using formal XML parsers) and therefore not compliant. All we can do on the server side is to do the right thing and always use namespaces.
>
> +1.
>
>> (2) Sean, your points on custom tags could also be equally valid for OpenSearch and ServiceCasting responses as well. Your example custom tag for dataSource could also be a custom tag for the other Discovery responses.
>>
>> So should we generalize your proposal across the other Discovery services too? If so, then we probably shouldn't use a "datacasting" namespace as in:
>>
>> <datacasting:dataSource>foo</datacasting:dataSource>
>>
>> But for the sake of making progress, I could see the need to concentrate on one service type at a time. But then again, there are overlaps.
>
> I'd say it might be good to consider some following research questions and to actually generate some #s behind them before getting far down any path. I've heard a lot of concern regarding collision. Does anyone know:
>
> 1. what the frequency % of collision is in at least some X use cases? IOW, how often does this happen? I have my own views on this BTW but I'd rather just see some real data points.
> 2. what are the top Y feed readers we're targeting? I would hope the answer is not Y = all of them. Downstream users and consumers can always think of ways to break "standards" and rather than design for those cases, I think resources and effort are best spent designing for the actual ones first (starting small then growing big).
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann at nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> Phone: +1 (818) 354-8810
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> _______________________________________________
> Esip-discovery mailing list
> Esip-discovery at lists.esipfed.org
> http://www.lists.esipfed.org/mailman/listinfo/esip-discovery

--
Dr. Christopher Lynnes     NASA/GSFC, Code 610.2    phone: 301-614-5185