[Esip-cloud] ESIP Cloud Computing Cluster April Telecon on Monday
James Coll
jamesmcoll at gmail.com
Thu Apr 21 16:25:27 EDT 2022
Announcement (apologies for cross posting)
Fresh off the heels of a very stimulating OGC event, we are excited to
welcome speaker Lucas Sterzinger, a PhD candidate at UC Davis, to present a
deeper dive into kerchunk at this month's meeting on Monday. Find the
abstract and meeting agenda below.
Meeting Logistics!
Topic: Kerchunck tutorial
Speaker: Lucas Sterzinger
Monday April 25th, 10:00-11:00 am PT / 1:00-2:00 pm ET
URL:
https://us02web.zoom.us/j/86535177705?pwd=ay9yVDJ6UzNiSGRMWTFxbkNXdEJXUT09
Meeting ID: 865 3517 7705
Passcode: 354962
Find your local number: Zoom International Dial-in Numbers
Abstract:
Many organizations are moving their data to cloud-hosted object storage,
which allows them greater flexibility in cost, dataset size, access, and
security. For multi-dimensional data, the Zarr format has emerged as a
popular cloud storage format, with consolidated metadata and data chunks
stored in separate objects that allow efficient parallel access.
NetCDF4/HDF5 files have been a community standard for decades and remain an
extremely popular format, however, they do not have consolidated metadata.
Without consolidated metadata, accessing this data requires many small
reads resulting in poor performance on the cloud. Transforming the vast
existing NetCDF4/HDF5 data archives would require substantial computational
resources and create a duplicate of the dataset, doubling storage
requirements and complicating data version control, provenance, and archive
protocols. A potential solution to this problem is to create a consolidated
metadata file containing the byte-range locations of the data chunks and
use it to access the NetCDF4/HDF5 data. Kerchunk, along with
ReferenceFileSystem - a new part of the Intake group's fsspec (local and
remote file system interfaces for Python) project - perform this task by
creating a JSON file that allows a NetCDF4/HDF5 file to look like a file
system. The data can then be read efficiently using the Zarr library
directly. Using data from the GOES-East satellite hosted on Amazon Web
Services, we demonstrate the effectiveness of this approach and provide a
pathway to improving data access for the vast existing NetCDF4/HDF5 data
archives.
Prior to the meeting, please clone the repo and set up the environment as
outlined in the readme at
https://github.com/lsterzinger/2022-esip-kerchunk-tutorial
Agenda:
5-10 minutes - Announcements: ESIP Summer meeting planning, Open Call for
other announcements
When you find a spare minute, please fill out the ESIP summer meeting
interest poll at https://forms.gle/iGjBVEjx5nUxkBRY7
20-40 minutes - Hands on Presentation
10-20 minutes - Discussion and questions
Hope you can make it!
Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.esipfed.org/pipermail/esip-cloud/attachments/20220421/ffc199dd/attachment.htm>
More information about the Esip-cloud
mailing list