[esip-semanticweb] Fwd: challenge description

Beth Huffer via esip-semanticweb esip-semanticweb at lists.esipfed.org
Tue Jul 15 19:02:02 EDT 2014


Greetings Semantic Webbies,

As I mentioned at our ESIP session, NASA has issued a challenge in 
connection with automated entity and relationship extraction from web 
content.  Please see details below.

Beth


-------- Original Message --------
Subject: 	challenge description

	

	

	



Challenge Details: Challenge Seeking Automated Entity and Entity Relationship Extraction from Web Content

Detailed Description & Requirements

Detailed Description:

As NASA's digital content grows, the ability to find information is increasingly more difficult.  In addition, with the exponential growth of information the need for a scalable solution to categorize content and make it easier and faster for employees to find the information and knowledge they need to do their job is critical.  Our current approach of not having an integrated Agency solution that provides users a method of searching against the vast treasure trove of information is akin to building multiple libraries across the country without having an organization system nor providing the community with a way of knowing which location to go to. As efforts are being worked to create an Agency Search Service across NASA's intranet, researching and implementing a cost efficient automated entity and relationship extraction and identification solution that is search engine agnostic will be critical.

This challenge seeks an efficient and automated entity and relationship extraction and identification solution that can automate the process of identifying concepts, people and other entities in NASA's intranet content so information discovery and new utilization of existing information can be made possible.

Specific solutions as well as recommendations of technologies or ways of making the Agency Search Service more usefull to the NASA employees are appreciated!

The Challenge Owner is also seeking expertise related to this challenge topic across the agency.  If you are an expert in this topic area and would like to be included in a distribution list going forward, please provide your name, position, and contact information in your response to the Challenge Owner.

Solution Requirements:

	• Information/concepts and relationships extracted must be able to be imported into a search engine like the Google Search Appliance, SOLR, Elastic Search, FAST, etc.
	• Completely automated
	• Capable of identifying and learning new concepts and relationships as more information is processed
	• Ability to correlate it with already processed information and relationships
	• Capable of utilizing Big Data sources available on the internet to infuse and learn new concepts
	• Must be able to securely extract relationships and content of moderate data
	• Identify items more than just entities such as concepts, people and locations would be very useful to researchers
		• Examples: mathematical formulas, periodic elements and compounds would increase our ability to relate findings and lessons learned across multiple disciplines and create search based applications and mash-ups providing new ways of using our existing data more efficiently and reducing the duplication of work across the Agency.
Other Information:

	• Search Engine Info:
		• Google Search Appliance (Current Search Engine)
			• https://developers.google.com/search-appliance/documentation/614/
		• SOLR
			• http://lucene.apache.org/solr/
		• Elastic Search
			• http://www.elasticsearch.org/
		• FAST (Enterprise Search for SharePoint)
	• Data:
		• All NASA intranet content including sites like inside.nasa.gov , hq.nasa.gov, io.jsc.nasa.gov, nix.nasa.gov, https://nen.nasa.gov
	• Taxonomy/Ontology sources:
		• https://knowledge.jsc.nasa.gov/index.cfm?event=taxonomy.feedback.homehttp://wiki.dbpedia.org/Ontologyhttp://www.w3.org/wiki/Good_Ontologieshttp://vissim.uwf.edu/VOT_Ontology/Ontology.htmlhttp://rpc295.cs.man.ac.uk:8080/repository/http://www.w3.org/TR/vocab-org/http://www.ontotext.com/kimhttp://imu.ntua.gr/software/quonto-quality-ontology-e-gov-portals
	• Examples of Technologies and Tools- these include some that can be used and some that we are currently evaluating. There is no requirement on which technology to use or that it has to be just one; a combination or hybrid solutions is acceptable. Below are just some of the available tools for mining, extracting and learning terms, concepts, etc:
		• General Architecture for Text Engineering (GATE)
			• http://gate.ac.uk/
		• Mallet
			• http://mallet.cs.umass.edu/
		• Weka
			• http://www.cs.waikato.ac.nz/ml/weka/
		• R
			• http://www.r-project.org/
		• Rapidminer
			• http://rapidminer.com/
		• XPLR
			• https://xplr.com/
		• UIMA
			• http://uima.apache.org/
		• OpenCalais:
			• http://www.opencalais.com/
	• Examples of Big Data and Search Solutions:
		• This example takes 20 million tweets and is able to not only locate where on a map the tweets came from, but also within seconds show related topics in a word cloud for the search done against these tweets. This particular example is using a GPU based database and while it may not be the solution, it is an example of the types of capabilities we are looking to create with the massive amount of intranet information available.
			• http://mapd.csail.mit.edu/tweetmap-desktop/
		• Carrot2 clustering search engine: While this technology itself does not work well on NASA's internal data it is a good example of the faceted search capability that we are working to achieve and is the core need behind this challenge. The internal content meta-data generates too many facets that are meaningless. Which is one of the reasons machine learning is such a critical component of this challenge.
			• http://search.carrot2.org/stable/search
		• Agency Search Portal Prototype: This is just a proof of concept Agency Search Portal. It includes Public, Public NASA, Internal NASA and Secure NASA content. It has the capability to get results from USAJobs, MedPlus, YouTube, Twitter, Instagram, Flickr and other search.usa.gov API's plus all the content indexed by the Agency Search Service.
			• http://google.jsc.nasa.gov/go.agency
  

About This Challenge Award

Up to two winners will be eligible to choose one of the following NASA at work Awards: 1. Cool NASA Experience 2. Personalized Astronaut Autographed Portrait for the Winner 3. External Recognition 4. Center Director Recognition



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-semanticweb/attachments/20140715/e8aa8344/attachment.html>


More information about the esip-semanticweb mailing list