[Esip-machinelearning] Application of Scalable Deep Learning Techniques for Analysis of Massive Scale Science Data Archives (AMASSD)

Mcgibbney, Lewis J (398M) lewis.j.mcgibbney at jpl.nasa.gov
Fri Aug 16 12:21:44 EDT 2019


Application of Scalable Deep Learning Techniques for Analysis of Massive Scale Science Data Archives  (AMASSD) will develop and leverage expertise at JPL and University Colorado Boulder’s Laboratory for Atmospheric and Space Physics (LASP) to deliver a rich suite of scalable deep learning techniques to increase the long-term use and value of massive Earth science data archives e.g. PO.DAAC and LASP.

PO.DAAC and LASP share similar long-term strategic mission goals in that each are tasked with utilizing expertise in science, engineering, mission operations, and scientific data analysis to preserve discipline-specific data and make these universally accessible and meaningful. It is however no secret that instrument technology advances (e.g. NASA’s forthcoming SWOT mission) resulting in larger/massive volumes of increasingly complex data are challenging the traditional roles of these institutions in the face of change. Now more than ever, the two biggest barriers to driving innovation in the long-term use and value of massive Earth science data is data volume and computational capability. AMASSD will provide long-term benefits for JPL and LASP by focusing on two strategic areas of both organizations; i) project-level; the holy grail for data analysis at PO.DAAC and LASP is to be able to choose from and execute data mining/machine learning algorithms over massive data volumes, and ii) discipline-level; deep learning as applied to data science is a technique still maturing at both stakeholder institutions.
Our graduate student investigator Shawn Polson, a first-year master’s student at UC Boulder as well as a Graduate Research Assistant at LASP, who has also previously completed a successful Senior Capstone project with JPL, will be begin his master’s thesis at UC Boulder that will aim to apply machine learning for generalized anomaly detection to satellite telemetry data. We have scoped his initial involvement at between 25-40% for the length of his master’s thesis with his involvement stabilizing at 40% once his thesis is complete and AMASSD is in year 2. We have also negotiated the provision and sharing of computing resources such that the project team can utilize existing hardware at LASP.
AMASSD objectives are to

  1.  Utilize existing enterprise grade deep learning open source software frameworks e.g. Apache MXNet, to develop a suite of deep learning approaches which can be tested and validated on institutionally-relevant, massive scale, high quality datasets residing at PO.DAAC and LASP.
  2.  Open-source the AMASSD software and documentation artifacts such that we can attract, nurture and grow a thriving community of users around the project.
  3.  Infuse AMASSD into user acceptance testing environments at PO.DAAC and LASP with the aim of achieving full software infusion within 2-3 years of the project start.
Our approach for achieving the above objectives

  1.  Mapping to Objective #1 – work with our project science stakeholders (Dr. Gierach and Dr. Tsontos) to solidify existing use cases including the prediction of atmospheric event such as El Niño and inland flooding events which result from hurricane formation. We will be utilizing Dr. Mandrake’s machine learning expertise to guide algorithm development.
  2.  Mapping to Objective #2 – Dr. McGibbney is well recognized within the NASA open source community, Earth and Space science informatics and at the Apache Software Foundation. He has a proven track record of developing successful open source software and communities.
  3.  Dr’s McGibbney, Gierach and Tsontos currently work on the PO.DAAC project. Between them have experience both overseeing the PO.DAAC Lab’s incubator project. They will work diligently with the anticipated recipients of the AMASSD software to ensure software infusion.
The AMASSD partnership will focus on advancing state of the art in deep learning techniques as applied to massive data archives; a highly relevant, emerging topic central to advancing the Data Science discipline at JPL and LASP alike. Our grand vision is for AMASSD to systematically drive innovation in the Data Science discipline at both institutions.


Dr. Lewis John McGibbney Ph.D., B.Sc.(Hons)
Data Scientist III
Computer Science for Data Intensive Applications Group (398M)
Instrument Software and Science Data Systems Section (398)
Jet Propulsion Laboratory
California Institute of Technology
4800 Oak Grove Drive
Pasadena, California 91109-8099
Mail Stop : 158-256C
Tel:  (+1) (818)-393-7402
Cell: (+1) (626)-487-3476
Fax:  (+1) (818)-393-1190
Email: lewis.j.mcgibbney at jpl.nasa.gov<mailto:lewis.j.mcgibbney at jpl.nasa.gov>
ORCID: orcid.org/0000-0003-2185-928X

           [signature_994573810]

 Dare Mighty Things
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.esipfed.org/pipermail/esip-machinelearning/attachments/20190816/d63d3f96/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 3432 bytes
Desc: image001.png
URL: <http://lists.esipfed.org/pipermail/esip-machinelearning/attachments/20190816/d63d3f96/attachment-0001.png>


More information about the Esip-MachineLearning mailing list