[Esip-discovery] [esip-semanticweb] time for new challenges?

Tue Jan 8 08:52:46 EST 2013

Hook and Eric, 
  Meet at break?

--
Christopher Lynnes, NASA/GSFC
301-614-5185
________________________________________
From: esip-discovery-bounces at lists.esipfed.org [esip-discovery-bounces at lists.esipfed.org] On Behalf Of Eric Rozell [rozele at rpi.edu]
Sent: Tuesday, January 08, 2013 8:13 AM
To: esip-discovery at lists.esipfed.org
Subject: Re: [Esip-discovery] [esip-semanticweb]  time for new challenges?

Have we come up with a list of "perspectives" or "slices" for the breakout groups to emphasize at the early morning session tomorrow for Discovery Grand Challenges?  If not, how should we go about assembling this list for tomorrow?  It's possible we could spend the first few minutes of the session brainstorming these "perspectives" as a group.

On Jan 3, 2013, at 2:57 PM, Lynnes, Christopher S. (GSFC-6102) wrote:

On Jan 3, 2013, at 11:33 AM, Sky Bristol wrote:

If you haven't run across it yet, Google has a slick tool for viewing how their engine picks up structured data in pages.

http://www.google.com/webmasters/tools/richsnippets

Cool!

We've been excited to watch developments in the people, places, and things world of the Knowledge Graph as we've worked fairly hard to connect dots between these known entities in our world. Embedding that information and connections gained from our private corporate data assets into publicly exposed web pages seems like a nice way to share information from our somewhat behemoth government systems. We're doing some work on this right now in our ScienceBase cataloging system based on the schema.org<http://schema.org/> stuff and some recent rumblings in gov circles about a way of exposing all agency data holdings using RDFa and some evolving notions of a JSON-based indexing system (similar to sitemaps). We're launching the first iteration of our schema.org<http://schema.org/> embedded attribution in our couple million item summary pages at www.sciencebase.gov<http://www.sciencebase.gov/>, but we're still working through some goofy issues like distinguishing between people and organizations for FOAF since we abstracted them to "parties" in our embedded contact schema (duh!). We'll be tracking before and after analytics and other log analyses to see how things go, but the fundamental idea we are pursuing is to push our information well beyond our doors so that our ScienceBase API is not the only way of getting at and interacting with our stuff.

We're interested in the same questions you pose and will try to keep up with the conversation. One of our next experiments is to release a KML encoding (format=kml on the end of a RESTful URL) of the georeferenced items in our catalog and publish geositemaps (in batches of 50K items) to see how that impacts discovery through map contexts. After that we plan to build on some work we did a while back with www.freebase.com<http://www.freebase.com/> and push a bunch of our items and connections between items into that venue as well. I'd be interested in hearing from anyone else who is doing any work along these lines.

Cheers.

Sky

thx for the info, Sky!  We may want to get a more thorough briefing from you in a later cluster telecon, either Discovery or Semantic Web...

On Jan 3, 2013, at 8:40 AM, "Lynnes, Christopher S. (GSFC-6102)" <christopher.s.lynnes at nasa.gov<mailto:christopher.s.lynnes at nasa.gov>> wrote:

I admit to being slow to see how servicecasting would solve our search engine discovery problem, but after running across the Google video about their Knowledge Graph (http://www.youtube.com/watch?v=mmQl6VGvX-c), I think I am finally starting to see the light:

Instead of attacking Search Engine Optimization in the tradional way, maybe we should target the knowledge graph efforts of Bing and Google.  That is, make our information available in a way that the knowledge graph picks it up.  Hence the inclusion of Semantic Web group in this message.

If we go that way, there are several things we need to understand about how Google/Bing consume semantic web information, e.g.:
a) encoding: RDF/XML, RDFa or something else?
b) data model:  what classes and/or properties are the search engines paying attention to?
c) graph traversal:  how and how far do the engines traverse the graph for any given search or result?
d) instance preferences:  are there particular instance entities that search engines weight preferentially?

Note that this implies that we solve the hooking of service casting to data with linked open data as a component of the solution...don't know if that is currently on the radar screen, but my impression from hearing Brian Wilson talk about publishing LOD is that it is on the radar for servicecasting.

Also, taking this LOD approach has benefits outside of Search Engine Optimization.  It might even be possible to take off from what we learn from the ToolMatch effort.

On Dec 19, 2012, at 12:24 PM, Ruth Ellen Duerr wrote:

OK, so I have to weigh in here.  This is after all the whole point of service casting hooked to data and the Libre project in general.  I think we made huge strides in this area over the last couple of years.  Yes there are a few issues - crawl frontier management and the data equivalent of page rank being the major issues; but other than that we have demonstrated the ability to find getCapabilities docs, OSDD's, web enabled folders, OAI-PMH catalogs, etc. wherever they are on the web.  And building a Google for data is a major interest of mine.  I've even come up with what I think could be that killer app - though I am totally uninterested in starting a startup, even if Boulder is a hub for doing that (yes, it is a cheap for pay kind of idea).

So, yes this is a good topic; but it is one we are already addressing albeit very slowly in the Discovery cluster.

Ruth

On Dec 19, 2012, at 8:37 AM, "Ramirez, Paul M (388J)" <paul.m.ramirez at jpl.nasa.gov<mailto:paul.m.ramirez at jpl.nasa.gov>> wrote:

Hi Chris,

Seems like a worthy and tractable cause to help those within the
organization to understand how to increase the visibility of their web
applications and sites. One of the first steps could be to understand how
a search engine views an application or site. For this there are already a
large variety of tools[1][2][3]. These tools of course are just a portion
of the solution but could help frame the problem. That said, is this an
ESIP level issue or more at the level of each organization? Moreover, it
seems as though understanding one's web presence is one of an ongoing task
as it would evolve as the search engines evolve and as we bring online
more web applications. Anyhow just some thoughts, I'm all for supporting
an activity gets exposure for the great applications that have already
been built.

Thanks,
Paul Ramirez
Jet Propulsion Laboratory
(818) 354-1015

[1] https://www.google.com/webmasters/tools/home?hl=en&pli=1
[2] http://support.google.com/webmasters/bin/answer.py?hl=en&answer=158587
[3] http://www.screamingfrog.co.uk/seo-spider/

On 12/19/12 5:58 AM, "Lynnes, Christopher S. (GSFC-6102)"
<christopher.s.lynnes at nasa.gov<mailto:christopher.s.lynnes at nasa.gov>> wrote:

Here is one not obviously related to OpenSearch...Faisal Hossain (this
year's winner of the Falkenberg!) wrote an editorial in BAMS lamenting
the difficulty of discovering useful web applications (e.g., Giovanni)
via major search engines by the applications data content:
http://journals.ametsoc.org/doi/full/10.1175/BAMS-D-12-00035.1

Hook and I have talked with Microsoft, and indirectly with Google, and
there appears to be no quick and easy silver bullet.

This problem appears to be widespread, not just Giovanni. Is this
something the Discovery Cluster should tackle?
--
Dr. Christopher Lynnes, NASA/GSFC, ph: 301-614-5185

_______________________________________________
Esip-discovery mailing list
Esip-discovery at lists.esipfed.org<mailto:Esip-discovery at lists.esipfed.org>
http://www.lists.esipfed.org/mailman/listinfo/esip-discovery

_______________________________________________
Esip-discovery mailing list
Esip-discovery at lists.esipfed.org<mailto:Esip-discovery at lists.esipfed.org>
http://www.lists.esipfed.org/mailman/listinfo/esip-discovery

--
Dr. Christopher Lynnes     NASA/GSFC, Code 610.2    phone: 301-614-5185
"Perfection is achieved, not when there is nothing left to add, but when there is nothing left to take away" -- A. de Saint-Exupery

_______________________________________________
esip-semanticweb mailing list
esip-semanticweb at lists.esipfed.org<mailto:esip-semanticweb at lists.esipfed.org>
http://www.lists.esipfed.org/mailman/listinfo/esip-semanticweb

--
Dr. Christopher Lynnes     NASA/GSFC, Code 610.2    phone: 301-614-5185
"Perfection is achieved, not when there is nothing left to add, but when there is nothing left to take away" -- A. de Saint-Exupery

_______________________________________________
Esip-discovery mailing list
Esip-discovery at lists.esipfed.org<mailto:Esip-discovery at lists.esipfed.org>
http://www.lists.esipfed.org/mailman/listinfo/esip-discovery