[Esip-preserve] DataCite to require "landing pages"

Thu Mar 1 13:55:11 EST 2012

Curt says:
"There are so many different ways to do some of these things, I'm kind of
struggling to figure out the best ways for our domain.  I could
definitely use some guidance and think this is a place ESIP could come
together and capture some guidelines/recommended practices."

This problem was what prompted my query re: development of a testbed for
systematic analysis &  comparison of alternative approaches...?

NSF had a testbed program  about 10 years ago -- SEE:
http://www.nsf.gov/pubs/2003/nsf03538/nsf03538.htm
and a report:
http://research.microsoft.com/en-us/um/people/bahl/Papers/Pdf/testbed_workshop_report_final.pdf

which says:
"network research testbeds have been the crucial proving grounds in which
new networking research ideas could be tested, stressed, observed,
reformulated, and ultimately proven before making their way into
operational systems."

the report goes on to note:
"FINDING 4: Successful past research testbed efforts. Network testbeds have
a proven
track record of providing the means by which important research advances in
our community
could be tested, stressed, observed, refined, and ultimately proven.
Important representative
efforts are discussed in Section 4.
FINDING 5: Unsuccessful past research testbed efforts. Network research
testbeds can
also fail (in the sense of not producing research insights that inform the
community) for
many reasons, including being overwhelmed by the operational aspects of
building and
maintaining the testbed, failure to publish or distribute results and/or
software, and lack of
focus on the testbed’s research goals."

Food for thought...?

Tom

*Tom Moritz
1968 1/2 South Shenandoah Street,
Los Angeles, California 90034-1208  USA
+1 310 963 0199 (cell) [GMT -8]
tommoritz (Skype)
http://www.linkedin.com/in/tmoritz*

“Πάντα ῥεῖ καὶ οὐδὲν μένει” (Everything flows, nothing stands still.) --*
Heraclitus *
"It is . . . easy to be certain. One has only to be sufficiently vague." --
C.S. Peirce  *
*"Kathambhutassa me rattindiva vitipatanti" (“The days and nights are
relentlessly passing; how well am I spending my time?”)  -- *"Ten Subjects
for Frequent Recollection by One Who Has Gone Forth"*
*"Il faut imaginer Sisyphe heureux."  ("One must imagine Sisyphus happy.")
-- Camus*

On Thu, Mar 1, 2012 at 10:30 AM, Curt Tilmes <Curt.Tilmes at nasa.gov> wrote:

> On 03/01/2012 12:16 PM, Greg Janée wrote:
>
>> On Feb 29, 2012, at 2:34 PM, Mark A. Parsons wrote:
>>
>>> Landing pages should ultimately be both human and machine readable.
>>>
>>
>> I've always thought this would be the best of both worlds, as both
>> humans and programmatic clients can then get representations of
>> resources that they can do something with.  But we seem to be
>> hampered by the lack of an agreed-upon technical approach.  Is this
>> something that ESIP might like to look at?  Two approaches that have
>> been suggested:
>>
>> 1. Content negotiation.  Clients use HTTP's Accept header mechanism to
>> request a specific representation of a resource: RDF, OAI-ORE, etc.
>> DataCite has put together an alpha version of this concept at
>> http://data.datacite.org/
>> .  I don't think there are (yet) any guidelines as to what resource
>> types return what information in what ways.
>>
>
> More information here:
>
> http://www.w3.org/Protocols/**rfc2616/rfc2616-sec12.html<http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html>
> http://www.w3.org/TR/webarch/#**def-coneg<http://www.w3.org/TR/webarch/#def-coneg>
> http://www.w3.org/QA/2006/02/**content_negotiation.html<http://www.w3.org/QA/2006/02/content_negotiation.html>
>
>
>  2. Identifier "inflections".  This is an idea proposed by John Kunze
>> at CDL.  A client can request a specific representation by adding a
>> syntactic cue to the identifier.  For example, it's already part of
>> the ARK specification that appending a question mark (?) to an
>> identifier returns metadata; perhaps appending a slash (/) requests
>> a "landing page" or other human-oriented experience as opposed the
>> resource directly.
>>
>
>
> RFC 5988 (http://tools.ietf.org/html/**rfc5988<http://tools.ietf.org/html/rfc5988>)
> has a way to add an
> extra "Link" http header to describe the relationship between URIs.
>
> We could use some special link between a "URI to a web page about some
> data" and "URI to retrieve the data".  (Maybe that is already
> defined?)  This seems related to some of the discovery work with their
> Atom 'link's.
>
>
> One issue people talk about a lot with identifiers and landing pages
> is distinguishing the URI for a thing and the URI for a web page of
> information about a thing.
>
> Once you start asserting facts, you have things like:
>
> "FOO Instrument" "created by" "Fred Smith"
> "FOO Instrument" "created on" "1999"
> "FOO Instrument Web Page" "created by" "Jane Doe"
> "FOO Instrument Web Page" "created on" "2012"
>
> you need two distinct identifiers (URIs) for those two distinct
> things.
>
>
> Suppose I use this identifier for the FOO Instrument:
>
>    http://somewhere/instrument/**FOO <http://somewhere/instrument/FOO>
>
> and I use this identifier for the web page about FOO:
>
>    http://somewhere/instrument/**FOO.html<http://somewhere/instrument/FOO.html>
>
> Of course when I resolve the former, I still want to see the landing
> page of information.
>
> Some people simply redirect from http://somewhere/instrument/**FOO<http://somewhere/instrument/FOO>->
> http://somewhere/instrument/**FOO.html<http://somewhere/instrument/FOO.html>
>
>
> I like the visual distinction of a redirect using two distinct URLs,
> rather than simply returning different information based on the Accept
> header.  (which we could also do) (It's also easier for me to test/try
> things out by adding ".html" or ".rdf" into the URL line on my web
> browser than it is to play with changing the Accept header.)
>
>
> Ok, suppose I have a dataset (uh, I mean structured collection of
> data), identified by
>
>   doi:10.001/FOOL1B.v001
>
> which I can map to a useful URI:
>
>   http://dx.doi.org/10.001/**FOOL1B.v001<http://dx.doi.org/10.001/FOOL1B.v001>
>
> Whenever someone resolves that URI, I want to give them something
> useful.  The question is what do I give them?  If FOOL1B.v001 is a
> collection of dozens (hundreds? thousands?) of granules, it doesn't
> really make sense point directly to a single one of them.
>
> DOI allows us to point that DOI to anywhere we like.  Best (uh, good?
> recommended?) practice seems to be to point it to some sort of landing
> page, perhaps:
>
>    http://some.data.center/FOO/**FOOL1B/FOOL1B.v001.html<http://some.data.center/FOO/FOOL1B/FOOL1B.v001.html>
>
> On that page, you get all the information about dataset, and perhaps
> links to the actual data, or at least an ordering interface.
> (e.g. http://nsidc.org/data/mod10_**l2v5.html<http://nsidc.org/data/mod10_l2v5.html>
> )
>
>
> Now think about an RDF representation of information about that
> dataset.  There is structured information on that page, so we should
> be able to express that information using something like RDF (or XML,
> JSON, etc.) and get it back directly via content negotiation.
>
>
> If we resolved the URI directly, ourselves, it's pretty easy, you just
> redirect to the RDF page about that URI, but using DOI, it seems like
> the redirection happens before it hits out page, right?
>
> (How does the dx.doi.org resolver work?  Is there a way to log
> multiple redirects with them?  Or does it happen at a level above the
> specific DOI, so my own resolver can get in there?)
>
>
> Anyway, it seems to me that if you requested
>
>   http://dx.doi.org/10.001/**FOOL1B.v001<http://dx.doi.org/10.001/FOOL1B.v001>
>
> with
>
>   Accept: application/rdf+xml
>
> doi.org would still redirect to
>
>   http://some.data.center/FOO/**FOOL1B/FOOL1B.v001.html<http://some.data.center/FOO/FOOL1B/FOOL1B.v001.html>
>
> (Is that right?)
>
> Which you would again request, still with the RDF Accept, and the web
> server at some.data.center could then redirect you to
>
>   http://some.data.center/FOO/**FOOL1B/FOOL1B.v001.rdf<http://some.data.center/FOO/FOOL1B/FOOL1B.v001.rdf>
>
> and dump out that structured information in rdf.
>
>
> Now look back at the instrument identifiers above, where
>
> http://somewhere/instrument/**FOO <http://somewhere/instrument/FOO> is
> re-directed to the landing page
> http://somewhere/instrument/**FOO.html<http://somewhere/instrument/FOO.html>
>
> What URL gives me RDF information about FOO, and what gives me RDF
> information about the web page of information about FOO?
>
> What comes from URL http://somewhere/instrument/**FOO.rdf<http://somewhere/instrument/FOO.rdf>
> ?
>
>
> I also like the 'inflections' described about, especially wtih about
> the multiple 'layers' of aggregation (like Ruth has been working with)
> -- I think we need consistent, algorithmic ways to express those
> aggregations clearly.
>
>
> Sorry about the rambling... there is a lot here we can discuss.  There
> are so many different ways to do some of these things, I'm kind of
> struggling to figure out the best ways for our domain.  I could
> definitely use some guidance and think this is a place ESIP could come
> together and capture some guidelines/recommended practices.
>
> Curt
>
>
> --
> Curt Tilmes, Ph.D.
> U.S. Global Change Research Program
> 1717 Pennsylvania Avenue NW, Suite 250
> Washington, D.C. 20006, USA
>
> +1 202-419-3479 (office)
> +1 443-987-6228 (cell)
>
>
> ______________________________**_________________
> Esip-preserve mailing list
> Esip-preserve at lists.esipfed.**org <Esip-preserve at lists.esipfed.org>
> http://www.lists.esipfed.org/**mailman/listinfo/esip-preserve<http://www.lists.esipfed.org/mailman/listinfo/esip-preserve>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20120301/ee2e317a/attachment-0001.html>