[Esip-datareadiness] "Data Quality Assessment for Machine Learning Tasks" TODAY at 2:00 p.m. Eastern Time
khalsa at Colorado.EDU
khalsa at Colorado.EDU
Wed Jan 19 12:05:24 EST 2022
Hello,
A presentation (TODAY) in NASA's ESDSWG on Data Interoperability that
may be of interest:
Hello DIML WG:
A DIML Telecon will take place on Wednesday, January 19 starting at 2:00
p.m. Eastern Time.
Farnoush Banaei-Kashani will speak on "Data Quality Assessment for
Machine Learning Tasks" at this Telecon - the abstract for his
presentation and his bio can be found below.
Here is the Earthdata wiki page for this Telecon:
https://wiki.earthdata.nasa.gov/display/ESDSWG/2022-01-19+DIML+Meeting+Notes
Here is the WebEx link for this Telecon:
https://nasaenterprise.webex.com/nasaenterprise/j.php?MTID=m6e774a5a151763a311db4d976b4c5aea
Thanks.
Best wishes, Peter L.
Data Quality Assessment for Machine Learning Tasks
Abstract: It is well understood from literature that the performance of
a machine learning (ML) model is upper bounded by the quality of the
data. While researchers and practitioners have focused on improving the
quality of models, there are limited efforts towards improving the data
quality. One of the crucial requirements before consuming datasets for
any application is to understand the dataset at hand and failure to do
so can result in inaccurate analytics and unreliable decisions.
Assessing the quality of the data with intelligently designed metrics
and developing corresponding data transformation operations to address
the quality gaps helps to reduce the effort of a data scientist for
iterative debugging of the ML pipeline to improve model performance.
This talk highlights the importance of analyzing data quality in terms
of its value for machine learning applications.
Bio: Farnoush Banaei-Kashani is currently an associate professor at the
Department of Computer Science and Engineering, University of Colorado
Denver, where he directs two US DoEd GAANN PhD Fellowship Programs in
"Big Data Science and Engineering" and “Data-Driven Cyber Security”, as
well as an MS Program in "Data Science in Biomedicine". Dr.
Banaei-Kashani is passionate about performing fundamental research
toward building practical, large-scale data-intensive systems, with
particular interest in Data-driven Decision-making Systems (DDSs), i.e.,
systems that automate the process of decision-making by applying data
scientific solutions to (big) data. Toward this end, he has organized
his research and education activities around two tracks: a Data Science
track and a Data Management, Mining and Modeling track. With the Data
Science track, his team engages with real-world problems that can
benefit from data scientific solutions, consisting of all data
life-cycle components from data collection and extraction, to management
and querying, to learning and mining, to visualization and storytelling,
all given various combinations of the V3 Big Data challenges. In
particular, his lab has experienced with a number of data-driven
decision-making systems (DDSs) from various application areas, such as
health informatics, personalized medicine, computational biology,
intelligent edge computing and IoT, intelligent transportation, and
geospatial analysis. The Data Science track complements the Data
Management, Mining and Modeling track by providing practical real-world
problems, which Dr. Banaei-Kashani's team generalizes, formalizes, and
rigorously studies as novel data management and mining as well as
machine learning problems. Dr. Banaei-Kashani has published more than 70
referred papers and has received several awards, including an ACM Sensys
2016 Best Paper Award and an IEEE CloudCom 2010 Best Paper Award. He
frequently serves as a program committee member of highly ranked data
science and database conferences (including KDD, VLDB, ICDE and
SIGSPATIAL), and has also chaired many workshops and conferences over
last few years, most recently the ACM SIGSPATIAL 2020 Conference. Dr.
Banaei-Kashani's research has been supported by grants from both
governmental agencies (NIH, NSF, DOE, DOD, DOT, DoEd, and DOJ) and
industry (Google, IBM, Chevron, Intel and United Healthcare).
.
ESDSWG mailing list
To unsubscribe send an email to esdswg-interoperability-leave at lists.nasa.gov
Modify your list subscription options or unsubscribe at...
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
More information about the Esip-datareadiness
mailing list