[Esip-discovery] [esip-semanticweb] time for new challenges?

Sky Bristol sbristol at usgs.gov
Thu Jan 3 11:33:54 EST 2013

If you haven't run across it yet, Google has a slick tool for viewing how their engine picks up structured data in pages.


We've been excited to watch developments in the people, places, and things world of the Knowledge Graph as we've worked fairly hard to connect dots between these known entities in our world. Embedding that information and connections gained from our private corporate data assets into publicly exposed web pages seems like a nice way to share information from our somewhat behemoth government systems. We're doing some work on this right now in our ScienceBase cataloging system based on the schema.org stuff and some recent rumblings in gov circles about a way of exposing all agency data holdings using RDFa and some evolving notions of a JSON-based indexing system (similar to sitemaps). We're launching the first iteration of our schema.org embedded attribution in our couple million item summary pages at www.sciencebase.gov, but we're still working through some goofy issues like distinguishing between people and organizations for FOAF since we abstracted them to "parties" in our embedded contact schema (duh!). We'll be tracking before and after analytics and other log analyses to see how things go, but the fundamental idea we are pursuing is to push our information well beyond our doors so that our ScienceBase API is not the only way of getting at and interacting with our stuff.

We're interested in the same questions you pose and will try to keep up with the conversation. One of our next experiments is to release a KML encoding (format=kml on the end of a RESTful URL) of the georeferenced items in our catalog and publish geositemaps (in batches of 50K items) to see how that impacts discovery through map contexts. After that we plan to build on some work we did a while back with www.freebase.com and push a bunch of our items and connections between items into that venue as well. I'd be interested in hearing from anyone else who is doing any work along these lines.



On Jan 3, 2013, at 8:40 AM, "Lynnes, Christopher S. (GSFC-6102)" <christopher.s.lynnes at nasa.gov> wrote:

> I admit to being slow to see how servicecasting would solve our search engine discovery problem, but after running across the Google video about their Knowledge Graph (http://www.youtube.com/watch?v=mmQl6VGvX-c), I think I am finally starting to see the light:
> Instead of attacking Search Engine Optimization in the tradional way, maybe we should target the knowledge graph efforts of Bing and Google.  That is, make our information available in a way that the knowledge graph picks it up.  Hence the inclusion of Semantic Web group in this message.
> If we go that way, there are several things we need to understand about how Google/Bing consume semantic web information, e.g.:
> a) encoding: RDF/XML, RDFa or something else?
> b) data model:  what classes and/or properties are the search engines paying attention to?
> c) graph traversal:  how and how far do the engines traverse the graph for any given search or result?
> d) instance preferences:  are there particular instance entities that search engines weight preferentially?
> Note that this implies that we solve the hooking of service casting to data with linked open data as a component of the solution...don't know if that is currently on the radar screen, but my impression from hearing Brian Wilson talk about publishing LOD is that it is on the radar for servicecasting.
> Also, taking this LOD approach has benefits outside of Search Engine Optimization.  It might even be possible to take off from what we learn from the ToolMatch effort.
> On Dec 19, 2012, at 12:24 PM, Ruth Ellen Duerr wrote:
>> OK, so I have to weigh in here.  This is after all the whole point of service casting hooked to data and the Libre project in general.  I think we made huge strides in this area over the last couple of years.  Yes there are a few issues - crawl frontier management and the data equivalent of page rank being the major issues; but other than that we have demonstrated the ability to find getCapabilities docs, OSDD's, web enabled folders, OAI-PMH catalogs, etc. wherever they are on the web.  And building a Google for data is a major interest of mine.  I've even come up with what I think could be that killer app - though I am totally uninterested in starting a startup, even if Boulder is a hub for doing that (yes, it is a cheap for pay kind of idea).
>> So, yes this is a good topic; but it is one we are already addressing albeit very slowly in the Discovery cluster.
>> Ruth
>> On Dec 19, 2012, at 8:37 AM, "Ramirez, Paul M (388J)" <paul.m.ramirez at jpl.nasa.gov> wrote:
>>> Hi Chris,
>>> Seems like a worthy and tractable cause to help those within the
>>> organization to understand how to increase the visibility of their web
>>> applications and sites. One of the first steps could be to understand how
>>> a search engine views an application or site. For this there are already a
>>> large variety of tools[1][2][3]. These tools of course are just a portion
>>> of the solution but could help frame the problem. That said, is this an
>>> ESIP level issue or more at the level of each organization? Moreover, it
>>> seems as though understanding one's web presence is one of an ongoing task
>>> as it would evolve as the search engines evolve and as we bring online
>>> more web applications. Anyhow just some thoughts, I'm all for supporting
>>> an activity gets exposure for the great applications that have already
>>> been built.
>>> Thanks,
>>> Paul Ramirez
>>> Jet Propulsion Laboratory
>>> (818) 354-1015
>>> [1] https://www.google.com/webmasters/tools/home?hl=en&pli=1
>>> [2] http://support.google.com/webmasters/bin/answer.py?hl=en&answer=158587
>>> [3] http://www.screamingfrog.co.uk/seo-spider/
>>> On 12/19/12 5:58 AM, "Lynnes, Christopher S. (GSFC-6102)"
>>> <christopher.s.lynnes at nasa.gov> wrote:
>>>> Here is one not obviously related to OpenSearch...Faisal Hossain (this
>>>> year's winner of the Falkenberg!) wrote an editorial in BAMS lamenting
>>>> the difficulty of discovering useful web applications (e.g., Giovanni)
>>>> via major search engines by the applications data content:
>>>> http://journals.ametsoc.org/doi/full/10.1175/BAMS-D-12-00035.1
>>>> Hook and I have talked with Microsoft, and indirectly with Google, and
>>>> there appears to be no quick and easy silver bullet.
>>>> This problem appears to be widespread, not just Giovanni. Is this
>>>> something the Discovery Cluster should tackle?
>>>> --
>>>> Dr. Christopher Lynnes, NASA/GSFC, ph: 301-614-5185
>>>> _______________________________________________
>>>> Esip-discovery mailing list
>>>> Esip-discovery at lists.esipfed.org
>>>> http://www.lists.esipfed.org/mailman/listinfo/esip-discovery
>>> _______________________________________________
>>> Esip-discovery mailing list
>>> Esip-discovery at lists.esipfed.org
>>> http://www.lists.esipfed.org/mailman/listinfo/esip-discovery
> --
> Dr. Christopher Lynnes     NASA/GSFC, Code 610.2    phone: 301-614-5185
> "Perfection is achieved, not when there is nothing left to add, but when there is nothing left to take away" -- A. de Saint-Exupery
> _______________________________________________
> esip-semanticweb mailing list
> esip-semanticweb at lists.esipfed.org
> http://www.lists.esipfed.org/mailman/listinfo/esip-semanticweb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-discovery/attachments/20130103/8965b913/attachment.html>

More information about the Esip-discovery mailing list