[Esip-cloud] Please join us Monday at 10am PT / 1pm ET for the next knowledge sharing session on fsspec's ReferenceFileSystem led by Martin Durant
Aimee Barciauskas
aimee at developmentseed.org
Tue May 18 14:35:37 EDT 2021
*[image: :calendar:]** Meeting Logistics! **[image:
:desktop_computer:]* *[image:
:eyes:]* *[image: :speech_balloon:]*
Topic: *fsspec's ReferenceFileSystem: A Virtual View of the Binary Chunks
of any URLs on Another Storage Backend*
Monday May 24 26th, 10-11am PT / 1-2pm ET
https://us02web.zoom.us/j/86535177705?pwd=ay9yVDJ6UzNiSGRMWTFxbkNXdEJXUT09
Meeting ID: 865 3517 7705
Passcode: 354962
Find your local number: https://us02web.zoom.us/u/knxOPNBj5
*Abstract:*
The size of datasets needed for scientific applications is such that
downloading files to a laptop is no longer feasible. Rather, we must move
compute to the data, and access data directly on cloud
storage services. This has led to new, specialized data formats, such as
Zarr and Cloud-Optimised GeoTIFF that enable chunk-wise and parallel IO, as
championed, for example, by the Pangeo collaboration.Many public agencies
require their archival data to be stored in a recognised standard format
like HDF5. However, translating a dataset to a cloud-optimized format
amounts to making a copy of the original, and so uses up
resources.ReferenceFileSystem
is an fsspec implementation which gives a virtual view onto the binary
chunks of any URLs on another storage backend. Since HDF5 (and other) data
is stored internally as binary encoded chunks, if we can map which chunk
belongs where, we can present these binary chunks to Zarr and read the
original data as a Zarr dataset, thus eliminating any conversion step.
Therefore,
at the cost of a single scan of the original data (where we store the
metadata of the chunk offsets), we get performant parallel IO on many
binary formats in archival storage.
Agenda:
* 20-30 minutes - Presentation
* 20-30 minutes - Discussion questions
* 10 minutes - Announcements: Working sessions, ESIP Summer Meeting
Planning, Open Call for other announcementsLooking forward to seeing y'all
there!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.esipfed.org/pipermail/esip-cloud/attachments/20210518/a8b8896a/attachment.htm>
More information about the Esip-cloud
mailing list