Johnny Yu (@thejohnnyyu)

Hey would a presentation on tahoe100 be a good one for this?

01.03.2025 17:49 👍 0 🔁 0 💬 1 📌 0

We had a lot of fun with this project. From idea to preprint it was only 4 months! ⚡

01.03.2025 17:14 👍 0 🔁 0 💬 0 📌 0

Team work makes the dream work!

01.03.2025 17:11 👍 0 🔁 0 💬 0 📌 0

Just the start of a movement

01.03.2025 17:10 👍 0 🔁 0 💬 0 📌 0

Yup split pool was an early key choice we made. Data is really not bad though, 1tb and you'll probably see some ultra dask type data structures to remove ram limitations super soon. So just solid state hdd and that'll be easy for a 1tb dataset

01.03.2025 17:09 👍 1 🔁 0 💬 0 📌 0

This is a good question

01.03.2025 17:07 👍 1 🔁 0 💬 0 📌 0

Let's go 🚀🌕

01.03.2025 17:06 👍 0 🔁 0 💬 0 📌 0

We're excited! Stay tuned as this year several models built on this data are going to come out in quick succession 🙂

01.03.2025 17:05 👍 1 🔁 0 💬 1 📌 0

Keep an eye on this - we're just getting started! Hope to have the ML community engaged as we continue in this direction 🚀

01.03.2025 17:03 👍 0 🔁 0 💬 1 📌 0

Tahoe-100M: A Giga-Scale Single-Cell Perturbation Atlas for Context-Dependent Gene Function and Cellular Modeling Building predictive models of the cell requires systematically mapping how perturbations reshape each cell's state, function, and behavior. Here, we present Tahoe-100M, a giga-scale single-cell atlas ...

@thejohnnyyu.bsky.social, @therealnima.bsky.social, and I, are excited to tell you about Tahoe-100M! The largest publicly available single-cell dataset that measures the effect of 1200 genes on 50 cell line models. The Vevo team has outdone itself. #Tahoe100M www.biorxiv.org/content/10.1...

25.02.2025 13:25 👍 81 🔁 34 💬 1 📌 6

This was all made possible by the Mosaic platform! What is Mosaic? @thejohnnyyu.bsky.social took his work in our lab, and scaled it in every dimension… Mosaic brings a highly diversified, exquisitely optimized, and optimally balanced “cell village” approach to perturbation data collection.

25.02.2025 13:25 👍 2 🔁 1 💬 1 📌 0

Vevo Therapeutics Open Sources Tahoe-100M, the World's Largest Single-Cell Dataset, as the Inaugural Contribution to Arc Institute's New Virtual Cell Atlas /PRNewswire/ -- In a landmark move to advance AI-driven biological research, Arc Institute and Vevo Therapeutics announced today that they have partnered on...

If you are intrigued by this, and if you're working on AI/ML, single-cell biology, or drug discovery, I urge y’all to reach out to @thejohnnyyu.bsky.social, @therealnima.bsky.social or any of the @vevotherapeutics.bsky.social team. www.prnewswire.com/news-release...

25.02.2025 13:25 👍 1 🔁 2 💬 1 📌 0

No Priors Ep. 103 | With Vevo Therapeutics and the Arc Institute YouTube video by No Priors: AI, Machine Learning, Tech, & Startups

Watch @thejohnnyyu.bsky.social @therealnima.bsky.social (@vevotherapeutics.bsky.social), @pdhsu.bsky.social , Dave Burke and I (@arcinstitute.org) talking about virtual cells, and how #Tahoe100M, now on. @arcinstitute.org's Virtual Cell Atlas, can change the game!

www.youtube.com/watch?v=ak_f...

25.02.2025 13:39 👍 8 🔁 6 💬 0 📌 0

Multiplexed mosaic tumor models reveal natural phenotypic variations in drug response within and between populations Many agents that show promise in preclinical cancer models lack efficacy in patients due to patient heterogeneity that is not captured in traditional assays. To address this problem, we have developed...

Our latest from the indefatigable @thejohnnyyu.bsky.social in collaboration with Weissman and Shokat labs. Meet GENEVA, which enables simultaneous phenotyping and profiling of cancer cell drug responses at scale; both in vitro and in vivo across a variety of models: www.biorxiv.org/content/10.1...

17.12.2024 20:16 👍 33 🔁 11 💬 1 📌 1

This will be instrumental for data sets like our Tahoe 100 million. Especially as we scale into normalizing 100 million cell data sets

11.12.2024 16:28 👍 0 🔁 0 💬 1 📌 0

scRNA-seq data sets exploding in number and size - check out scanpy & anndata for >1b cells: new experimental update includes APIs for scaling with dask from anndata, integrated with lots of scanpy and rapids-singlecell functions.
gist.github.com/ilan-gold/98...

11.12.2024 15:48 👍 50 🔁 13 💬 2 📌 0

keep in touch!

05.12.2024 23:01 👍 2 🔁 0 💬 0 📌 0

100M dataset!

05.12.2024 15:42 👍 2 🔁 0 💬 0 📌 0

Vevo Therapeutics to Release World's Largest Atlas of Single-Cell Transcriptomic Data, Tahoe-100M, to Map How Drugs Impact Patient Cells and Accelerate Discovery of New Drugs /PRNewswire/ -- Vevo Therapeutics, a biotechnology company using its Mosaic technology and next generation AI to uncover better drugs for more patients,...

@thejohnnyyu.bsky.social, @therealnima.bsky.social
, Kevan Shokat, and I are excited to announce a historic achievement by our team at @vevotherapeutics.bsky.social: tinyurl.com/4rrvap3t

05.12.2024 14:25 👍 43 🔁 12 💬 4 📌 7

Johnny Yu

Latest posts by Johnny Yu @thejohnnyyu