[Esip-discovery] Datacasting custom elements proposal
Ramirez, Paul M (388J)
paul.m.ramirez at jpl.nasa.gov
Thu Mar 31 12:25:16 EDT 2011
Hey All,
Hopefully, I can point you to a few more answers. Also I'm not quite sure as to the larger context of this discussion so I hope these answers apply. See comments inline below:
On Mar 30, 2011, at 9:53 PM, Mattmann, Chris A (388J) wrote:
> Hi Bob,
>
>>> Yep that's one way to do it. You could also put the above inline in the XML document itself. Check out that link I sent you on inlining schema in XML document instances:
>>>
>>> http://msdn.microsoft.com/en-us/library/aa302288.aspx
>>
>> I saw that before but tuned out as soon as I saw "Few other XML Schema
>> implementations besides those by Microsoft actually do support inline
>> schemas." ;-)
>
> :) I hear ya.
>
>>>
>>> You could also use e.g., an xsd:annotation too. But you'd also reference the dc: namespace in your XML schema which would reference back to the definition for conforms-to-standard.
>>
>> xsd:appInfo has to go inside xsd:annotation as far as I can tell.
>>
>> How do you go about telling the validator (validating the schema itself)
>> that e.g. dc:conforms-to-standard can appear only in the xsd:appInfo
>> element and not somewhere else? Something has to glue the two
>> namespaces together. That's what I'm not getting.
>
> OK I just CC'ed Paul and gave up on getting him to join the list. Hey Paul can you chime in on the above? :)
Here's what you do. In your schema definition you import the dc namespace and identify the prefix in the root tag. Because the xsd:appinfo is defined with an Open Content Model and processContents set to lax if a schema is provided then the validator should try to validate the contents. So something like this:
<xsd:schema xmlns:data="data caster namespace"
xmlns:dc="dublin core namespace"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="data caster namespace"/>
<xsd:import namespace="dublin core namespace" schemaLocation="...."/>
</xsd:schema>
So now when you use the dc elements in the appinfo tag they should be validated. If you wanted to further constrain which tags should appear in the appinfo I would say construct a schema which defines a config element and specifies exactly which elements to include. I haven't actually tried the above before (in the context of the appInfo element) so there might be a tweak or two to make it work but that is my understanding of the spec. Also of note is the schemaLocation is always just a hint so in operations the parser can remap the URI that appears in there to a local resource if necessary/provided to the parser.
>
>>
>>>> Also, pardon my schema ignorance. But if you have a well-defined
>>>> schema, say for datacasting, can that be augmented via additional
>>>> schemas (namespaces?) *without* changing the datacasting schema itself
>>>> and still be validateable? (which is really the same as the question
>>>> above, about having a schema just for the additions to the schema's
>>>> schema (did I really just write that sentence??))
>>>
>>> LOL. I think James G's answer is what I would say too -- it's possible to do, especially with inline schemas, and with the ability to override definitions.
>>
>> My understanding of James' answer was that it was a way to specify that
>> arbitrary XML could go in the document itself. I'm trying to add things
>> to the schema language here, to extend the actual schema document.
>>
>> Also, maybe I should qualify "validateable" by saying that I mean not
>> just that it's well-formed XML but that the contents can be checked. I
>> mean, xsd:appInfo allows arbitrary XML, and by that token any schema
>> using it that is well formed could be valididated, technically. But
>> that's not a real validation. A true validation ensures that the
>> document follows the rules... in this case, that the schema extensions
>> in appInfo follow the specifications we set up. If someone misspells
>> it, e.g. "dc:conforms-2-standard" then that should be flagged by a
>> validator.
>
> James, is that your take?
It seems like part of what you guys are trying to do would be helped along by designing your schema using an Open Content Model. Essentially this comes down to baking in some <xsd:any> and/or <xsd:anyAttribute> tags and setting the processContents attribute appropriately (default is strict which means a definition must be provided).
>
>>
>>
>>>> Put another way, are schemas subclassable in an object-oriented sense?
>>>> The goal is for the above to still be a valid a, with additions
>>>> specified by b, without modifying or duplicating a's schema, yet still
>>>> be fully validateable.
>>>
>>> Yep, they are able to do that. The guru I know for XML schemas is Paul Ramirez on the PDS Engineering Node. He's been doing most of the design implementation of stuff like this at JPL. I've been trying to get him to join the discussion...if anyone else knows Paul, help me encourage him :)
>>
>> Can you find an example?
>
> I'll defer to Paul on this.
Yes they are, but things can get hairy especially using the XML Schema 1.0 standard as restrictions on a given element can only be done within that elements namespace. This issue is addressed in XML Schema 1.1 as when importing elements/types from another namespace and wanting to restrict them ends up being something you will quickly run into. In addition, restrictions end up looking ugly though as the content model of the type that is being restricted is copied over into the new type. This leads one to ask why not just define a new type that "conforms" to the other and validate against it. Content model extensions are fairly easy in that you can either use the xsd:extension element or consider using and Open Content model as described above.
Really what you should consider is who, what, when, where, why, and how you want people to extend your schema. As things solidify I think you may find that not as many people will be changing and extending the core data caster schema as you may first think. Especially, if you define the areas where people are allowed to extend up front (open content model) and then rely on them to provide validation for those elements. More clearly stated the XML instance documents themselves can point to the definitions for the elements that don't appear in the data caster schema.
Finally, future data caster feeds may need to evolve to support new functionality (i.e. elements and/or attributes). If you plan for that, by partitioning new functionality into a new namespace, then older tools and schemas can still work against new content; this would be accomplished by having an open content model. This is not to say that older tools and or schemas would use the new content for anything but rather that it would not stop them from proceeding with what they would normally do.
>
>>
>> What I'm trying to do here is to poke at the schema definition to see if
>> it can actually accommodate the metadata descriptions (m-a-m) we're
>> envisioning here. If not, then we can discard schemas as an option. If
>> it's possible but difficult, that's good to know. It's also good to
>> know if it's easy.
>>
>> So what I'm getting at is twofold:
>>
>> 1) can the schema language support additional m-a-m items (like
>> conforms-to-standard) in the schema itself that can themselves be
>> validated? So that we can ensure the schema itself is valid? (this
>> goes to the schema that describes the schema.)
>>
>> 2) can a data provider extend the datacasting (e.g.) schema to add
>> his/her own custom metadata elements, without having to redefine the
>> base schema? i.e. as an augmentation or "subclass"?
>>
>> If either answer is no, then I don't think the schema language will work
>> for this application. If both answers are yes, then examples would be
>> useful to help us decide if it's really the right way to go.
>>
>> I'm all for standards here. Even though I have some vested interest in
>> the datacasting model, if using the schema language really is the
>> "proper" thing to do, then we should do it. But we have to ensure that
>> it will actually work, and truly is the "proper" thing.
>>
>> Thanks...
>
> Gotcha, we'll flush this out...
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann at nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> Phone: +1 (818) 354-8810
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
Hope this Helps,
Paul Ramirez
(818) 354-1015
P.S. Where can I sign up to this mailing list?
More information about the Esip-discovery
mailing list