[ESIP-all] Request for Information on Data Brokering/Mediation services
jones at nceas.ucsb.edu
Tue Sep 11 17:56:07 EDT 2012
Dear Siri Jodha,
Inline below are some initial responses representing the DataONE data
repository federation from me and Dave Vieglais. Let me know if you have
further questions -- we'd be happy to discuss the growing DataONE
federation. Beyond your questions below, you can find additional
information about DataONE on our web site (http://www.dataone.org),
including extensive architectural documentation (direct link:
http://mule1.dataone.org/ArchitectureDocs-current/) that discusses our
interoperability approach to issues such as cross-institutional web
services, user identification, data identification, data packaging, data
versioning, metadata, etc.
Matthew B. Jones
Director of Informatics Research and Development
National Center for Ecological Analysis and Synthesis (NCEAS)
University of California, Santa Barbara
DataONE co-lead for Core Cyberinfrastructure Team
On Tue, Sep 11, 2012 at 10:37 AM, Siri Jodha Singh Khalsa <
siri.khalsa at colorado.edu> wrote:
> Request for Information on Existing Data and Information Mediation Services
> The US National Science Foundation is developing a new data and knowledge
> management system for the 21st Century, called EarthCube<http://earthcube.ning.com/>,
> with a goal of providing geoscientists with the data and tools needed to
> improve interdisciplinary science and education. Differences in the
> attitudes, protocols and semantics used in sharing information across
> scientific communities is one of the key challenges that EarthCube must
> address. Solutions based on having all disciplines adopt a common discovery
> and access technology are deemed unrealistic so we are looking for
> instances of interdisciplinary interoperability supported by means of
> If you use or know of any software or service that mediates the
> interactions between multiple, heterogeneous systems to facilitate the
> discovery, access, processing or semantic interpretation of data, we would
> very much appreciate receiving your answers to the questions below. If you
> cannot answer all the questions please provide a contact for obtaining
> further information.
> Your response will be most useful to us if received by September 24. You
> may either reply to this email or send your response to Siri Jodha Khalsa at
> khalsa at colorado.edu.
> Thank you for your cooperation.
> 1. What is your name, organization and contact information?
> Matthew Jones, NCEAS, UC Santa Barbara
jones at nceas.ucsb.edu
Dave Vieglais, University of Kansas, vieglais at ku.edu
> 1. What is the name of the software/service that performs mediation?
> DataONE Service Interface for repository
> 1. In what application domain is it being used?
> Earth, ecological, biodiversity, and environmental
> 1. Who created the software/service? Who currently maintains it?
> The DataONE project, a multi-institutional
collaborative federation of repository member nodes, coordinating nodes,
and software systems, created and maintains the system. Co-PIs on the NSF
DataNet project that provides initial funding for DataONE are at UNM, UCSB,
U Tennessee, KU, USGS, and dozens of additional universities.
> 1. How is the software/service accessed? (web-based, grid-based,
> REST-based web services act as a lightweight
interoperability layer to mediate communications among repository nodes,
coordinating nodes, and scientific software applications. Currently there
are four service Tiers, and a participating node can opt to join at any of
the four Tiers (with inceasing Tiers providing successively more
> 1. If the software/service supports data discovery by distributing
> queries to multiple repositories of data or information, must those
> repositories all use the same discovery protocol?
> Discovery is based on a geographically distributed
index of all metadata in the federation, which is created through a
semantic mapping and extraction process from metadata standards used
throughout the domain. Over time, the DataONE discovery service will add
support for additional search protocols.
> 1. Can the mediation software/service interpret records that use
> different data models or metadata standards?
> Yes, we have a semantic cross-walk that allows us to
extract metadata from multiple standards, including EML, FGDC, METS, ISO
19115, and others. We extend to support new standards whenever a new
repository joins the federation with a new common metadata type.
> 1. Does the software/service require that the repositories it accesses
> possess any common element (software, protocol, data model, etc.)?
> Yes, interoperability is achieved through common
member agreement on the web service interfaces among system components.
The DataONE coordinating nodes maintain a Node Registry that indicates
which services tiers are supported by each participating member node
> 1. Can the software/service itself be accessed via different protocols?
> The software service is mainly accessed via web
services over HTTP using multiple languages and environments. Although the
existing services all use HTTP, we are planning to support additional
protocols by allowing member nodes to advertise other protocols that they
might expose for specific data sets as part of the Node Registry that
> 1. Does software/service provide direct access to data after discovery?
> Yes, DataONE provides for direct access to data,
and has mechanisms to ensure that data is persistent and valid over time,
even when particular repository partners might cease to participate. The
model for exposing data makes the data machine-accessible, and does not
require a human agent to navigate web landing pages or other barriers.
> 1. Can the software/service mediate between different ontologies,
> vocabularies, or natural languages?
> Yes, it mediates among multiple metadata
standards. Ongoing work by the DataONE semantics and integration working
group will produce additional semantic interoperability at various points
in the federation.
> 1. Is there a maximum number of different sources that the
> software/service can support? How many concurrent users can it support?
> What are typical response times during interactive sessions? What are its
> known performance bottlenecks?
> No, there is no maximum, as the federation is
built to scale effectively with each additional repository that joins as a
Member Node. Storage is distributed and replicated at DataONE Member
Nodes. Typical response times for discovery are subsecond responses for
queries. Data upload and download occurs directly to Member Nodes, and so
depends on the network connections available between the client and Member
Node and the data size for any given transaction.
> 1. Can the software/service orchestrate the connection and execution
> of heterogeneous workflows/processing services?
> Yes, the web service interfaces support the
connection by multiple different applications, including workflow systems
such as Kepler and VisTrails, analytical systems such as R and Matlab, and
arbitrary other clients. The system supports both read and write access to
data (subject, of course, to access control rules on particular data
> 1. Will the software work in a high performance computing or non-web
> Yes, the system can be deployed in HPC
environments. For example, experimental DataONE Member Nodes have been
demonstrated at TeraGrid sites. Authentication is integrated with the
CILogon X.509 certificate service, which is used in many HPC environments
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ESIP-all