[Esip-discovery] Relating services to datasets served, and datasets to available services

Tue Aug 30 20:07:57 EDT 2011

Folks,

I'd like to float a solution to two problems that are related to DCP-2:

 - how to have a service cast entry specify what datasets it "serves"
 - how to have a dataset (or collection) cast specify what services are
    available to query, access, or transform each dataset.

The proposed solution is to reuse the two casting standards and the
OpenSearch protocol.

The list of datasets that a particular service allows access to might be
lengthy and computed using a search or semantic lookup process.
Trying to 'name' datasets in the service cast is problematic since one
has the problem of what names to use.

So the idea is to hide this 'lookup' behind a URL, which could be
an OpenSearch URL for example.

So the scast entry would contain a <link> tag as follows:

<link rel="http://esipfed.org/ns/discovery/1.1/collection#"
     type="application/atom+xml"
     xmlns:esip="http://esipfed.org/ns/discovery/1.1/"
     esip:protocol="http://a9.com/-/spec/opensearch/1.1/"
     href="<specific OpenSearch URL that does the appropriate search>" />

Here I've reused the "collection" URI that has already been defined, but
we could define a more specific one like "collectionsServed".  Either way,
this is the known link (with rel=) to answer that question.

The fact that this <link> is an OpenSearch is expressed in the 'protocol'
attribute using the usual URI for versioned opensearch protocol.
Of course, the <link> could be of some other type; e.g. a direct link to 
a collection cast.

And this is why I think we need both the 'rel' and 'protocol' attributes.
One needs to specify both that the link's purpose is to answer collectionsServed
question, and the protocol for getting the answer is an OpenSearch yielding a
feed (the collection cast). By using two attributes, each of these URI's could
also be de-referenceable and point to some additional information.

The beauty of hiding the collectionsServed question behind a search link
is that it nicely reuses the OpenSearch protocol and the collection cast format.
The list of collectionsServed is available on demand, it can change without
having to alter and re-publish the service cast, and metadata describing the
collections is immediately available in the usual feed format.

A GUI that wants to present metadata about the collections served has it
immediately available.  But service metadata and dataset metadata are
strictly separated into their respective casts, and reuse known formats.

Behind the OpenSearch link, the collectionsServed lookup might be a
SQL dbase lookup, or a SPARQL query, or involve some semantic reasoning.
Implementers are free to innovate any way they want to, but in the meantime
we can move forward with standardizing the casting formats.  If they don't
want to use OpenSearch for this link, they can also choose an alternate
protocol, at the risk of requiring users to understand additional request &
response formats (besides OS and collection cast).

The reverse problem can be solved in the same way.

A link to 'servicesAvailable' could appear in each entry in a collection
cast, as in:

<link rel="http://esipfed.org/ns/discovery/1.1/service#"
     type="application/atom+xml"
     xmlns:esip="http://esipfed.org/ns/discovery/1.1/"
     esip:protocol="http://a9.com/-/spec/opensearch/1.1/"
     href="<specific OpenSearch URL that does the appropriate search>" />

This time the opensearch yields a service cast listing the services available
for that dataset, with the usual metadata in a known format.

This also explains why I have been arguing for standardizing two attributes
in the <link> tag:  rel and esip:protocol.  Given this flexibility and extra
power, we can design solutions to thorny problems like above.  And all of
our URI's will be cleanly defined or reused from W3C, and ultimately
could be de-referenceable for more information if that serves additional
purposes.

It occurs to me that we might think about defining 'vendor-specific'
MIME types for the casting (extended Atom) formats.  For example:
"application/vnd.esip.discovery.cast.collection" and
"application/vnd.esip.discovery.cast.service".  However, this seems a
bit ugly and perhaps counterproductive.  For most purposes, it
will be better to use the generic Atom or RSS mime type that more
software will know.

 -- Brian