Super excited to see #icechunk v1.0 ship today. Stable format, stable API, and ready for production. Take it for a spin and let us know how it goes. 🚀🚀🚀
Latest posts tagged with #Icechunk on Bluesky
Super excited to see #icechunk v1.0 ship today. Stable format, stable API, and ready for production. Take it for a spin and let us know how it goes. 🚀🚀🚀
Some Postdoc throwback today riding the NJtransit to Princeton.
On my way to the #WCRP km scale hackathon for the week.
www.wcrp-esmo.org/activities/w...
Excited to play around with #healpix #zarr #icechunk and some super high res data.
I'll be at the CNG conference in Snowbird next week. I wrote a short blog post about what the Earthmover team will be up to.
tldr; we'll be talking about @zarr.dev, #icechunk, @xarray.bsky.social and cloud-native data cubes.
Details in the blog post 👇
Most people think of @zarr.dev as a "file format". With #Icechunk, we've turned Zarr into a database. @functionth.bsky.social's post shows how Icechunk can be used to solve a problem where transactional databases are often required.
2/ Could you run a bank ledger on #Icechunk? A multidimensional array store designed for scientific data probably isn’t the first thing that comes to mind for this application... But surprisingly, it totally works!
@zarr.dev and #icechunk are amazing but they are not magic. They are part of a thoughtfully designed cloud-native data architecture. @tegnicholas.bsky.social peels back the covers on cloud-optimized scientific data formats in our latest "Fundamentals" post 👇
We found similar results when we first benchmarked #icechunk. Our conclusion: doing IO with a Rust backend is much faster than Python.
👇Really exciting to see @kylebarron.dev's Obstore backend for Zarr-Python ship today.
4/ 📒 How cloud-optimized formats are structured
🧊 How @zarr.dev and #Icechunk are designed to work efficiently in the cloud
🤑 How this saves you money
Training AI models at scale from data stored in cloud object storage requires thinking carefully about both bandwidth and concurrency. In this post, @functionth.bsky.social get’s into the details of concurrent reads at scale, showing how #Icechunk and S3 can easily scale beyond 200k requests/second!
Along the way, he dispels pervasive myths about how S3 prefixes work and the limits that key names impose on scalability. Relevant not just for #Icechunk but any cloud data system (including Apache Iceberg ) which stores data across many objects in object storage.
I share @rabernat.bsky.social excitement about icechunk!!! On top of delivering 100x performance, it can make impossible tasks possible.
Why am I so excited to endorse #icechunk and #virtualizarr?
bsky.app/profile/eart...
🚨 New blog post 🚨
In it, we show off our recent work deploying #icechunk on top of #NASA's existing archives of Earth observation data. The results: 100x speed up when extracting time series from existing datasets stored as netCDF.
2/ In the pilot, we used our new open source tensor storage engine #Icechunk and #VirtualiZarr to present archival NetCDF data stored in S3 as a single analysis-ready cloud-optimized (ARCO) dataset.
This session is going to be a blast! If you are headed to CNG next month (and you should be!), consider joining us for this workshop on @xarray.bsky.social , @zarr.dev , and #icechunk. 👇👇👇
Join our webinar, Data Version Control for Arrays with #Icechunk. @rabernat.bsky.social will explain how Icechunk’s transactions, snapshots, tags, + branches can add safety & flexibility to data pipelines and workflows. Register here: bit.ly/3WVPSf2
The 3.0.0 release clears the way for a bunch of exciting extensions built on top of the v3 spec. #icechunk, variable chunking, new dtypes, and more are all now possible. Time to get busy.
The parallels between MotherDuck’s ddx storage system and @earthmoverhq.bsky.social’s #icechunk are uncanny, let alone the mission to create cloud-native DBs.
With this architecture, we showed that we can easily scale a simple OPeNDAP service, sitting on top of an #icechunk repository in S3, to thousands of requests per second. 🚀
In the talk, I made a few simple points:
- Separation of storage and compute is key to unlocking the scaling potential of cloud
- Cloud optimized data formats are key (example: @zarr.dev and #icechunk)
- API services should be stateless/serverless and should be able to scale horizontally [0->N]
Also on Thursday afternoon, I'll be giving an invited talk titled "Seamless Arrays: A Full Stack, Cloud-Native Architecture for Fast, Scalable Data Access". It combines all that we've been working on for the past year including @zarr.dev v3, #icechunk, and Xpublish.
agu.confex.com/agu/agu24/me...
Monday through Thursday, I'll be hanging out with @rabernat.bsky.social at the @earthmoverhq.bsky.social booth in the exhibit hall. Swing by to say hello or to snag some swag/stickers/etc. We'll also be demoing #icechunk all week.