[ESIP-all] ESIP Data Readiness Cluster February Meeting – Croissant: A High-Level Metadata Vocabulary for ML Datasets

Douglas Rao - NOAA Affiliate douglas.rao at noaa.gov
Wed Feb 14 13:45:18 EST 2024


Dear all,

ESIP Data Readiness Cluster invites you to join our February cluster
call with an invited presentation from Omar Benjelloun (software engineer
at Google).

During this meeting, we will have an invited presentation from MLCommons
Croissant working group (Omar Benjelloun) on their recent development on
Croissant, a high-level metadata vocabulary for machine learning datasets.
Please see the information below.

*Time*: 02/20 1 pm ET/UTC-5
*Where*: See ESIP Community Calendar for the Zoom link (
https://www.esipfed.org/get-involved/community-calendar).

*Abstract*: Croissant is an open community-built standardized metadata
vocabulary for ML datasets, including key attributes and properties of
datasets, as well as information required to load these datasets in ML
tools. Croissant enables data interoperability between ML frameworks and
beyond, which makes ML work easier to reproduce and replicate.
This talk will provide an overview of the Croissant format, and demonstrate
its benefits 1) for dataset consumers, who can search for Croissant
datasets, access their metadata on ML repositories like Kaggle and
HuggingFace, and load them into popular ML frameworks like TensorFlow, Jax
and Pytorch, and 2) for dataset creators, who can use the Croissant editor
to easily create, modify, and validate datasets in the Croissant format.

*Spear*: Omar Benjelloun is a software engineer at Google, where he has
developed data-focused products (Google Public Data Explorer, Google
Dataset Search) and Search features (media reviews, public statistics
answers, related entities, …) for over a decade and a half. Prior to
joining Google, Omar received a PhD in Databases from INRIA / University of
Paris Orsay, and spent two years as a postdoc in the Database group at
Stanford University.

Please join us on 02/20 via the ESIP Community Calendar (
https://www.esipfed.org/get-involved/community-calendar).

-Douglas

-- 
Douglas Rao, Ph.D. (he/him/his)
Research Scientist
North Carolina State University <http://www.ncsu.edu>
Cooperative Institute for Satellite Earth System Studies (CISESS)
<http://www.ncics.org>
NOAA National Centers for Environmental Information
<http://www.ncei.noaa.gov>
151 Patton Ave. Asheville, NC 28801
+1.828.271.4903

I work on a flexible work schedule and across a number of time zones.
Apologies for any out of hours email.

"Every individual matters. Every individual has a role to play. Every
individual makes a difference.” – *Dr. Jane Goodall*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.esipfed.org/pipermail/esip-all/attachments/20240214/4d054315/attachment.htm>


More information about the ESIP-all mailing list