[ESIP-all] 7 AGU Sessions addressing: Science Data Analytics; Data Science

Kempler, Steven J. (GSFC-5860) via ESIP-all esip-all at lists.esipfed.org
Mon Jul 7 22:08:02 EDT 2014

In case you have not heard, AGU is coming again this year... in December.

Please allow me to bring your attention to 7 sessions that specifically address Science Data Analytics and/or Data Science (listed below for your convenience).  Note one (the first one) targets the education-minded community.  The others are associated with Earth and Space Science Informatics, but conveners are very interested in insights and experiences from the physical science communities, as well as from the informatics community.  See you in SF.



Teaching Science Data Analytics Skills Needed to Facilitate Heterogeneous Data/Information Research:  The Future Is Here

Session ID#: 1879
Session Description:

Scientists are increasingly exploring heterogeneous data analysis methodologies to tease out information and knowledge. Science data analytics techniques need to be well understood and advanced in order to maximize cross dataset integration and usability. Data Scientist required skills in performing data analytics, to better understand unobvious relationships across various datasets, are becoming more and more appreciated and significant, given the increasing amount of heterogeneous data available.  This session seeks papers that: Describe university and non-university science domain oriented data scientist (and data analytics) training being provided to students, and; Desirable science research oriented analytics skills and expertise that are needed to be taught, so that students can move into high demand, science domain data scientist (data analytics) positions.  Topics covered include curriculums and science data research projects, that teach/utilize machine learning, statistics, data mining, decision support modeling, or other analytics techniques.

Primary Convener:
Steven J Kempler
Emily Law, Sara J Graves, Chung-Lin Shie



Identifying and Better Understanding Data Science Activities, Experiences, Challenges, and Gaps Areas

Session ID#: 1809
Session Description:

Today, industries are calling Data Science “the sexiest job of the 21st century”. But how do Data Scientists contribute to scientific research?  What experiences, challenges, and solutions have Data Scientists had that future Data Scientists can learn from?  What do Data Scientists need to know, to support Earth and Spaces science research?  This session seeks papers that describe Data Science activities, experiences, and challenges, as well as the expertise and skills Data Scientists need.  Areas that may be covered include data lifecycle phases: data modelling, acquisition, cleaning, integration, analysis, and interpretation, each of which introduces challenges, problems, and solutions.  We invite papers that address:
   Type of work a Data Scientist performs
   Data Science experiences and lessons
   Data Science challenges
   Data Science top problems (and solutions).  For example:
   Ensuring data and meta-data consistency
   Maintaining analytics expertise per science domain
   Supporting quality and uncertainty
   Advancing data analytics techniques

Primary Convener:
Emily Law
John S Hughes and Steven J Kempler


Advancing Analytics using Big Data Climate Information System

Session ID#: 3022
Session Description:

Earth system science has seen massive increase in both observational data and modeling outputs. This constitutes a Big-Data challenge that demands Big-Data technologies to address.   However, it is difficult for individual investigators or research groups to implement petabyte-scale platforms required to tackle the data analyses needed. It is also increasingly obvious that we must share and leverage our infrastructure investments in order to scale our research and development efforts and to increase the scientific productivity or throughput. Thus, in this session we seek presentations for innovative techniques, systems, or infrastructures that address the petascale data analysis and collaborative research and development challenges.
The focus of this session is on data analytics, rather than search, access, or curation. Subtopics of interest include:
Science applications focusing on integrating climate modeling and satellite observations.
Techniques, systems, infrastructures that enable seamless collaborations.
Innovative approaches of interactive visualization enhancing analytics of large data sets.

Primary Convener:
Kwo-Sen Kuo
Tsengdar J Lee, Michael S Seablom, Ramakrishna R Nemani


Big Data in the Geosciences: New Analytics Methods and Parallel Algorithm

Session ID#: 3292
Session Description:

Earth and space science data are increasingly large and complex--often representing high spatial/temporal/spectral resolution and dimensions from remote sensing or model results--making such data difficult to analyze, visualize, interpret, and understand by traditional methods.  This session focuses on application and development of new geoscientific data analytics approaches (statistical, data mining, assimilation, machine learning, etc.) and parallel algorithms and software employing high performance computing resources for scalable analysis and novel applications of traditional methods on large geoscience data sets.  Analysis methods that operate in-situ with parallel simulations to reduce output data volumes are also of interest.  Abstracts focused on analysis, synthesis and knowledge extraction from large and complex Earth science data from all disciplines are invited

Primary Convener:
Jitendra Kumar
Robert L Jacob, Forrest M Hoffman, Miguel D Mahecha


Leveraging Enabling Technologies and Architectures to enable Data Intensive Science

Session ID#: 3041
Session Description:

The objective of this session is to share innovative concepts, emerging solutions, and applications for Big Earth and Space Data to enable Data-Intensive Science. Data-Intensive Science defines three high-level activities: capture, curation, and analysis of data. Being able to handle massive amount of data impacts our architectural decisions and approaches. Topics include demonstration, studies, methods, and/or architectural discussion on
Common enabling technologies
Automated techniques for data analysis
Science analysis and visualization
Real time decision support
Implication of Data Intensive science to education
Data management lifecycle functions from data capture through analysis
Architecture that spans multiple data systems and organizations

Primary Convener:
Thomas Huang
Rahul Ramachandran, Daniel J Crichton, Morris Riedel


Open source solutions for analyzing big earth observation data

Session ID#: 3080
Session Description:

Most current earth observation data has become freely available, but has also become too large to download and analyse on local machines. Several solutions exist to analyse "near the data", e.g. array data bases, solutions build on hadoop, solutions that use R or python to organize a cluster, or google earth engine. Not all of these are open source and hence suitable for transparent reproducible scientific research purposes. This session will attract papers that present solutions to and experiences with analysing big earth observation data near the data, using open source software. It will also accept contributions with non open-source solutions that are willing to discuss transparency and reproducibility.

Primary Convener:
Edzer J Pebesma
James Frew, Robert J Hijmans, Jonathan A Greenberg


Technology Trends for Big Science Data Management

Session ID#: 2525
Session Description:

The technology trend toward the use of ontologies, models and information representation [1] is predicted to favorably impact system architectures for big science data management. Data scientists in the space and earth sciences are developing system architectural components that incorporate these technologies into the data lifecycle - from ground systems through to the archives – and that help drive science analysis by producing interoperable systems and correlatable data. Technologies include ontologies for model driven development, science and engineering discipline ontologies, metadata and provenance standards vocabularies, and the use of semantic infrastructure for integration, publication, and analysis of science data to promote cross-disciplinary studies.  This session invites papers on these and related technologies that are intended to improve the discovery and correct use of data and help meet the expectations of modern scientists in the Big Data era.  [1] National Research Council (U.S.). Frontiers in Massive Data Analysis. 2013.

Primary Convener:
John S Hughes
Daniel J Crichton, Yolanda Gil, Bernd Ritschel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/mailman/private/esip-all/attachments/20140708/ee552cf4/attachment.html>

More information about the ESIP-all mailing list