[ESIP-all] Data Profiles, Access Metrics, and ORCID Logins Now Available on DataONE

Amber E Budden aebudden at gmail.com
Wed Jan 27 12:58:57 EST 2016


*Users of DataONE can now access detailed metrics on their uploaded data
through our new data profiles.*

January marked the launch of version 2 of the DataONE systems supporting
our network of 33 Member Nodes. Designed to be responsive to community
input, version 2 supports data profiles, and enables sign in through ORCID,
Google or University affiliations; streamlined access for client tools such
as R; and a host of new technical features making it easier for Member Node
repositories to manage their data.

Data profiles let researchers know their data are being accessed.  By
signing in to DataONE Search, users can now access detailed metrics on
datasets that have been uploaded into repositories within the DataONE
network of Member Nodes.  For example, the new data profile page for
researcher Jennifer Balch
<https://search.dataone.org/#profile/uid=jkbalch,o=unaffiliated,dc=ecoinformatics,dc=org>
provides a summary of her individual records and contributions, information
on the total number of data and metadata downloads, temporal trends in
downloads, counts of data and metadata uploads, summaries of file formats
used as well as a graphical representation of the time range over which
data were collected.  These features are provided within the DataONE Search
interface facilitating a seamless transition between reviewing personal
data level metrics and discovering other data within the network.  The same
information can be aggregated and viewed for entire Member Node
Repositories (such as the KNB; https://search.dataone.org/#profile/KNB),
and for the whole DataONE network (https://search.dataone.org/#profile
<https://search.dataone.org/#profile/KNB>).  An overview of these features,
and of the signin process, is provided in two of our newly released DataONE
Search screencast tutorials at
https://www.dataone.org/dataone-search-screencasts

DataONE supports ORCIDs <http://orcid.org/>!  When signing into DataONE
search using ORCID, users can connect their publicly available ORCID data
with their DataONE profile. ORCID (Open Researcher and Contributor ID) is a
platform independent, persistent digital identifier that provides
consistent researcher identification for scientific and academic
scholarship over time.

We’ve made signing in to DataONE much simpler for people using
DataONE-enabled tools such as the DataONE R package, the R client that
provides read/write access from DataONE Member Node repositories. Now,
users only need to copy their authentication token from within their
DataONE profile page to sign in and upload data from within R . This
functionality will soon be extended to users of Matlab. From their profile
page, users can also create ‘groups’ in order to manage access and
permissions to their data, enabling collaboration on a private data set
before publication, or share editing and publishing privileges across
multiple users.

For more detailed technical information on version 2 of the DataONE service
infrastructure, including information on series identifiers and metadata
control, please read below.

Information about DataONE Search and other DataONE tools can be found at:
https://www.dataone.org/investigator-toolkit


To search for data or access your data profile go to:
https://search.dataone.org/

For screencast tutorials see:
https://www.dataone.org/dataone-search-screencasts


Version 2.0 Features

Version 2.0 of the DataONE service infrastructure was released over the
course of December 2015 and followed by a number of updates during January
of 2016. This major upgrade to DataONE services implements a lot of changes
in the background and lays the foundation for a host of new features that
the DataONE team expects to release on a regular basis.

The Version 2.0 services represent an evolutionary improvement from Version
1, and are fully backwards compatible. Existing infrastructure, such as
Member Nodes and Investigator Tools,  do not need to be immediately
upgraded, and may choose to continue operating in Version 1.

Most of the new functionality offered by Version 2 is a direct result of
feedback provided by the community of DataONE users, including both Member
Nodes and investigators, over the several years of production operations.
Two such major changes in Version 2 include support for mutable content
through the use of "Series Identifiers" and transition of authoritative
system metadata control from Coordinating Nodes back to the Member Nodes.
Series Identifiers

All content in DataONE is immutable and uniquely identified by a Persistent
Identifier (PID) in order to support repeatable analysis use cases.

In version 2, we have introduced the Series Identifier (SID) to support a
new use case that ensures that the latest revision of a dataset can be
retrieved by its identifier. In Version 1, it was necessary to query the
system for available revisions to each object in order to locate the most
recent revision of a data set or its components. The availability of SIDs
in Version 2.0 offers increased efficiency for the user looking for the
most recent version of a data set.

The availability of both SIDs and PIDs for identifying content provides
more flexibility for content providers and consumers and aligns well with
typical use patterns. The most important distinction is that a PID will
always refer to an exact version of an object whereas a SID will always
refer to the latest revision of a series.
System Metadata Control

In Version 1 of the infrastructure, system metadata (access control rules,
replica information, and other details) was created at the Member Nodes,
but always managed by Coordinating Nodes. This simple method of operation
meant that Coordinating Nodes always held the most up-to-date information
about any object in the DataONE federation. However, it also meant that any
changes to system metadata, such as an update to access control rules,
would always need to be done at a Coordinating Node. This would sometimes
cause an undesirable latency in returning updates to a Member Node.

In Version 2, the Member Node now contains the authoritative copy of system
metadata. In version 2, changes to metadata such as access control rules
may be created, distributed, and reflected quickly at Member Nodes.
Coordinating Nodes are notified of such edits, and prioritize updates of
the system metadata to themselves and to other Member Nodes holding
replicas. The overall benefit of this change is a significant reduction in
latency for common operations that alter system metadata, and a generally
more responsive environment for content curators.
Other Significant Changes

DataONE infrastructure has always proven reliable (with >99.999% uptime
since starting production operations in July 2012). There is however,
always opportunity for improvement. Version 2.0 also includes numerous
internal adjustments to improve efficiency and reliability of the overall
infrastructure:


   -

   Solr 5 and Zookeeper for distributed indexing
   -

   Full support for suggested and actual content file names and media type
   -

   Improved usage log aggregation with COUNTER support
   -

   New bearer token authentication mechanism that operates in parallel with
   the existing certificate based authentication process
   -

   Support for authentication by ORCID in addition to CILogon and the
   InCommon Federation of identity providers
   -

   Numerous bug fixes and performance improvements


Technical documentation of the services offered in Version 2.0 is available
for review at: https://purl.dataone.org/architecture-dev


About DataONE:

DataONE <http://www.dataone.org/> enables universal access to data and also
facilitates researchers in fulfilling their needs for data management and
in providing secure and permanent access to their data. DataONE offers the
scientific community a suite of tools and training materials that cover all
aspects of the data life cycle from data collection to management, analysis
and publication.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/mailman/private/esip-all/attachments/20160127/d344e9bc/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DataONESearchUserProfile.png
Type: image/png
Size: 175226 bytes
Desc: not available
URL: <http://www.lists.esipfed.org/mailman/private/esip-all/attachments/20160127/d344e9bc/attachment-0001.png>


More information about the ESIP-all mailing list