<div dir="ltr"><div class="gmail_default" style="font-size:small">Hello ESIP Cloud Computing Cluster Members!<br><br>We are excited to welcome guest speakers Taylor Gowan and John Horel of the Department of Atmospheric Sciences, University of Utah and Cloudnine Weather to present on <b>Using Zarr to Store and Efficiently Access Output From Operational Numerical Weather Prediction Models. </b><i>Read the abstract, questions and meeting agenda below.</i></div><div class="gmail_default" style="font-size:small"><i><br></i></div><div class="gmail_default" style=""><i style=""><b style=""><font size="4">Meeting Logistics!</font></b></i></div><div class="gmail_default" style="font-size:small">Topic: Using Zarr to Store and Efficiently Access Output From Operational Numerical Weather Prediction Models.<br>Monday April 26th, 1:00-2:00 pm ET/10:00-11:00 am PT<br><a href="https://us02web.zoom.us/j/86535177705?pwd=ay9yVDJ6UzNiSGRMWTFxbkNXdEJXUT09">https://us02web.zoom.us/j/86535177705?pwd=ay9yVDJ6UzNiSGRMWTFxbkNXdEJXUT09</a><br>Meeting ID: 865 3517 7705<br>Passcode: 354962<br>Find your local number: <a href="https://us02web.zoom.us/u/knxOPNBj5">https://us02web.zoom.us/u/knxOPNBj5</a><i><br></i></div><div class="gmail_default" style="font-size:small"><b><br></b></div><div class="gmail_default" style="font-size:small"><b>Abstract:</b></div><div class="gmail_default" style="font-size:small">Research and operations dependent on numerical weather prediction involves synthesizing vast amounts of continuously updating output grids that require viable, economical archival and retrieval solutions. We were the only public archive from 2015 until recently for output from the High-Resolution Rapid Refresh (HRRR) forecast modeling system of the National Weather Service. That university-based archive has been relied upon by over a thousand registered users. Fortunately, our research group no longer needs to continue expanding beyond the current 130+ terabytes of HRRR model output in GRIB2 format (a file type that efficiently stores hundreds of two-dimensional variable fields for a single valid time) since Amazon and Google are doing so now as part of the Open Data Program of the National Oceanic and Atmospheric Administration.</div><br>Despite the highly compressible nature of GRIB2 files, they are on the order of several hundred megabytes each, making high-volume input/output applications challenging due to the memory and compute resources needed to parse these files. With support from the Amazon Sustainability Data Initiative, our group is creating and maintaining HRRR model output in an optimized format, Zarr, in a publicly-accessible S3 bucket<span class="gmail_default" style="font-size:small"> </span>- hrrrzarr. That bucket contains sets for each model run and every variable of analysis and forecast files sectioned into 96 small chunks. The structure of the HRRR-Zarr files are designed to allow users the flexibility to access data they need by selecting many small files for subdomains and parameters of interest without the overhead that comes from accessing GRIB2 files. The workflows required to generate the Zarr files and illustrations of use cases common to weather and machine learning applications will be presented.<div><br></div><div><b><span class="gmail_default" style="font-size:small">Discussion </span>Questions<span class="gmail_default" style="font-size:small"> - Choosing the right chunk shape</span>:</b><br><br>Many options exist as far as how to organize N-dimensional data sets such as output from numerical weather prediction models. In our case, N=6 (x-longitude<span class="gmail_default" style="font-size:small">, </span>y-latitude, z-height, v-variable, t-model run time, f-model forecast time). We chose to generate x-y chunks and generate 3-D (f,<span class="gmail_default" style="font-size:small"> </span>x, y) Zarr cubes. What use cases would have benefited from alternative approaches in terms of chunk size, compression, and dimensions? Does it make sense to generate complete Zarr archives using other dimensional combinations or leave that to users to create their own repositories?</div><div><br><div class="gmail_default" style="font-size:small"><b>Agenda:</b></div><div class="gmail_default" style="font-size:small"><ul><li>20-30 minutes: Presentation</li><li>20-30 minutes: Discussion questions</li><li>10 minutes: Aimee, with credit to Sudhir Raj Shrestha, Rob Casey, and Rich Signell, provides overview of draft <a href="https://docs.google.com/document/d/1dcGFNohOnjPLQcYwY-W5Bxs8pbayXhwjRpPfL3i-UmA/edit#heading=h.pwl6y4r2af53">ESIP Cloud Computing Cluster 2021 Plan</a> and how cluster participants can get involved</li></ul><div><br></div><div>Looking forward to seeing y'all there!</div><div>Aimee and Sudhir</div></div><div class="gmail_default" style="font-size:small"><br></div></div></div>