[Esip-discovery] time for new challenges?

Thu Dec 20 15:58:15 EST 2012

Hey Chris,

On 12/20/12 6:12 AM, "Lynnes, Christopher S. (GSFC-6102)"
<christopher.s.lynnes at nasa.gov> wrote:

>
>On Dec 19, 2012, at 10:35 PM, "Mattmann, Chris A (388J)"
><chris.a.mattmann at jpl.nasa.gov> wrote:
>
>> Yep, this is the classic difference between "vertical" (specialized) and
>> "horizontal" (general) search engines. This is why e.g., Fandango has
>>its
>> own *way* better search for movies, than e.g., Google would for the
>>same.
>
>However, newbie data users don't know to go to the vertical search engine
>in the first place.  The bottom line, therefore is this:

+1

>
>How can I make my web application / portal show up in the first page of a
>Google or Bing search, assuming a minimum amount of user pre-knowledge
>and using fairly general words like "rainfall", "snow", etc.  (i.e.,
>without specific acronyms).

My message to you -- I am not sure that's a tractable problem :)

Google, Bing, etc., are such huge companies with proprietary algorithms,
and ranking/SEO, and also content manipulation, etc. IOW, they are too big
of "horizontal" search engines -- which is why the "vertical" space is
IMHO more tractable. For horizontal engines -- that is the name of their
game, they are 10B USD industries in the same. Trying to game them has
been tried by larger corpora than us (aka ESIP Discovery Cluster) and it's
hard to do in general. They have link/spam/redirection algorithms that we
know about (those are the ones that they have published), but moreover
they have the things that *we don't know about*, and that's my point,
those are fairly hard to influence.

>
>That said, I think Pedro and Ruth may be onto something with the ideas of
>datacasting (to increase the static universe of pages likely to be
>indexed and thus more chance of coming up top in the hits) and rich
>OpenSearch responses (which again, may be simply static datacasts as well.

I think those are fine approaches, and may have localized success in
certain areas. I'm just saying that many of us close to the search engines
domain for the past decade have gone to great lengths to figure out how to
influence search engines even going to great length to create open
versions and entire communities of 1000s of individuals of the same and
we've had success in areas, it's just taken a long time :)

Perhaps that's not of concern to you -- and you are proposing that this
*is* going to take a long time, in which case, I believe that this dead
horse has been beat and I'll stand down off my soap box.

>
>(As a side note:  I'm also happy to report that this discussion has made
>me realize how we can provide a fully open API to Giovanni as a service.
>Which now seems so blindingly obvious, I just want to say "Duh".)

+1

>
>Anyway, we are going to talk Grand Challenges at ESIP, so I would like to
>put this out there as a possibility.  Also, keeping in mind that as good
>as the suggestions in this discussion are, I would be interested in
>seeing how we might obtain numerical evidence as to how well they work in
>real life with Google and Bing.

I have a list of readings that I can dig up and share on this difficult
topic. Some of it is in the syllabus for the course I linked to you (see
required reading/papers).

Cheers,
Chris

>
>>
>> FYI my class at USC (CSCI 572) on Information Retrieval and Search
>>Engines:
>>
>> http://sunset.usc.edu/classes/cs572_2011/
>>
>> Cheers,
>> Chris
>>
>> On 12/19/12 10:41 AM, "Lynnes, Christopher S. (GSFC-6102)"
>> <christopher.s.lynnes at nasa.gov> wrote:
>>
>>> On Dec 19, 2012, at 12:13 PM, jeff mcwhirter wrote:
>>>
>>>> On 12/19/12 9:52 AM, Mattmann, Chris A (388J) wrote:
>>>>>
>>>>> SEO is an extremely difficult problem, couched in Information
>>>>>Retrieval
>>>>> Research/theory. Most advances are wholly incremental or point
>>>>> solutions
>>>>> that aren't widespread as of yet.  Most of that has to do with the
>>>>> search
>>>>> engine companies guarding their intimate optimizations and ranking
>>>>> secrets
>>>>> very closely.
>>>>>
>>>>
>>>> I don't think it is really an issue of traditional search engine
>>>> optimization but rather that there often isn't a crawable site.  It
>>>> seems like most repositories are search oriented and what pages are
>>>> there don't have much in the way of text corpus to index.   If the
>>>> pages
>>>> and the text aren't there Google isn't going to index it.
>>>>
>>>> -Jeff
>>>>
>>>
>>> Yes, this is a problem, esp. as we go to more AJAX-populated content to
>>> provide a better UX for the users.
>>>
>>> Another issue is that general-purpose portals/applications have
>>> difficulty "competing" with more specialized sites. "Rainfall" may show
>>> up once in a description for a general purpose application, but a
>>>weather
>>> site, for example, is all "rainfall" this and "rainfall" that.
>>> --
>>> Dr. Christopher Lynnes     NASA/GSFC, Code 610.2    phone: 301-614-5185
>>> "Perfection is achieved, not when there is nothing left to add, but
>>>when
>>> there is nothing left to take away" -- A. de Saint-Exupery
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Esip-discovery mailing list
>>> Esip-discovery at lists.esipfed.org
>>> http://www.lists.esipfed.org/mailman/listinfo/esip-discovery
>>
>
>--
>Dr. Christopher Lynnes, NASA/GSFC, ph: 301-614-5185
>
>
>