Hey would a presentation on tahoe100 be a good one for this?
Hey would a presentation on tahoe100 be a good one for this?
We had a lot of fun with this project. From idea to preprint it was only 4 months! β‘
Team work makes the dream work!
Just the start of a movement
Yup split pool was an early key choice we made. Data is really not bad though, 1tb and you'll probably see some ultra dask type data structures to remove ram limitations super soon. So just solid state hdd and that'll be easy for a 1tb dataset
This is a good question
Let's go ππ
We're excited! Stay tuned as this year several models built on this data are going to come out in quick succession π
Keep an eye on this - we're just getting started! Hope to have the ML community engaged as we continue in this direction π
@thejohnnyyu.bsky.social, @therealnima.bsky.social, and I, are excited to tell you about Tahoe-100M! The largest publicly available single-cell dataset that measures the effect of 1200 genes on 50 cell line models. The Vevo team has outdone itself. #Tahoe100M www.biorxiv.org/content/10.1...
This was all made possible by the Mosaic platform! What is Mosaic? @thejohnnyyu.bsky.social took his work in our lab, and scaled it in every dimensionβ¦ Mosaic brings a highly diversified, exquisitely optimized, and optimally balanced βcell villageβ approach to perturbation data collection.
If you are intrigued by this, and if you're working on AI/ML, single-cell biology, or drug discovery, I urge yβall to reach out to @thejohnnyyu.bsky.social, @therealnima.bsky.social or any of the @vevotherapeutics.bsky.social team. www.prnewswire.com/news-release...
Watch @thejohnnyyu.bsky.social @therealnima.bsky.social (@vevotherapeutics.bsky.social), @pdhsu.bsky.social , Dave Burke and I (@arcinstitute.org) talking about virtual cells, and how #Tahoe100M, now on. @arcinstitute.org's Virtual Cell Atlas, can change the game!
www.youtube.com/watch?v=ak_f...
Our latest from the indefatigable @thejohnnyyu.bsky.social in collaboration with Weissman and Shokat labs. Meet GENEVA, which enables simultaneous phenotyping and profiling of cancer cell drug responses at scale; both in vitro and in vivo across a variety of models: www.biorxiv.org/content/10.1...
This will be instrumental for data sets like our Tahoe 100 million. Especially as we scale into normalizing 100 million cell data sets
scRNA-seq data sets exploding in number and size - check out scanpy & anndata for >1b cells: new experimental update includes APIs for scaling with dask from anndata, integrated with lots of scanpy and rapids-singlecell functions.
gist.github.com/ilan-gold/98...
keep in touch!
100M dataset!