[Esip-datareadiness] "Data Quality Assessment for Machine Learning Tasks" TODAY at 2:00 p.m. Eastern Time

khalsa at Colorado.EDU khalsa at Colorado.EDU
Wed Jan 19 12:05:24 EST 2022


Hello,

A presentation (TODAY) in NASA's ESDSWG on Data Interoperability that 
may be of interest:

Hello DIML WG:

A DIML Telecon will take place on Wednesday, January 19 starting at 2:00 
p.m. Eastern Time.

Farnoush Banaei-Kashani will speak on "Data Quality Assessment for 
Machine Learning Tasks" at this Telecon - the abstract for his 
presentation and his bio can be found below.

Here is the Earthdata wiki page for this Telecon:
https://wiki.earthdata.nasa.gov/display/ESDSWG/2022-01-19+DIML+Meeting+Notes

Here is the WebEx link for this Telecon:
https://nasaenterprise.webex.com/nasaenterprise/j.php?MTID=m6e774a5a151763a311db4d976b4c5aea

Thanks.

Best wishes, Peter L.


Data Quality Assessment for Machine Learning Tasks

Abstract: It is well understood from literature that the performance of 
a machine learning (ML) model is upper bounded by the quality of the 
data. While researchers and practitioners have focused on improving the 
quality of models, there are limited efforts towards improving the data 
quality. One of the crucial requirements before consuming datasets for 
any application is to understand the dataset at hand and failure to do 
so can result in inaccurate analytics and unreliable decisions. 
Assessing the quality of the data with intelligently designed metrics 
and developing corresponding data transformation operations to address 
the quality gaps helps to reduce the effort of a data scientist for 
iterative debugging of the ML pipeline to improve model performance. 
This talk highlights the importance of analyzing data quality in terms 
of its value for machine learning applications.

Bio: Farnoush Banaei-Kashani is currently an associate professor at the 
Department of Computer Science and Engineering, University of Colorado 
Denver, where he directs two US DoEd GAANN PhD Fellowship Programs in 
"Big Data Science and Engineering" and “Data-Driven Cyber Security”, as 
well as an MS Program in "Data Science in Biomedicine". Dr. 
Banaei-Kashani is passionate about performing fundamental research 
toward building practical, large-scale data-intensive systems, with 
particular interest in Data-driven Decision-making Systems (DDSs), i.e., 
systems that automate the process of decision-making by applying data 
scientific solutions to (big) data. Toward this end, he has organized 
his research and education activities around two tracks: a Data Science 
track and a Data Management, Mining and Modeling track. With the Data 
Science track, his team engages with real-world problems that can 
benefit from data scientific solutions, consisting of all data 
life-cycle components from data collection and extraction, to management 
and querying, to learning and mining, to visualization and storytelling, 
all given various combinations of the V3 Big Data challenges. In 
particular, his lab has experienced with a number of data-driven 
decision-making systems (DDSs) from various application areas, such as 
health informatics, personalized medicine, computational biology, 
intelligent edge computing and IoT, intelligent transportation, and 
geospatial analysis. The Data Science track complements the Data 
Management, Mining and Modeling track by providing practical real-world 
problems, which Dr. Banaei-Kashani's team generalizes, formalizes, and 
rigorously studies as novel data management and mining as well as 
machine learning problems. Dr. Banaei-Kashani has published more than 70 
referred papers and has received several awards, including an ACM Sensys 
2016 Best Paper Award and an IEEE CloudCom 2010 Best Paper Award. He 
frequently serves as a program committee member of highly ranked data 
science and database conferences (including KDD, VLDB, ICDE and 
SIGSPATIAL), and has also chaired many workshops and conferences over 
last few years, most recently the ACM SIGSPATIAL 2020 Conference. Dr. 
Banaei-Kashani's research has been supported by grants from both 
governmental agencies (NIH, NSF, DOE, DOD, DOT, DoEd, and DOJ) and 
industry (Google, IBM, Chevron, Intel and United Healthcare).
.
ESDSWG mailing list
To unsubscribe send an email to esdswg-interoperability-leave at lists.nasa.gov
Modify your list subscription options or unsubscribe at...
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


More information about the Esip-datareadiness mailing list