Rohit Singh (@rohitsingh8080)

More broadly: bio foundation models are being scaled up rapidly, but representational consistency across scales is rare.

Reverse distillation offers a post-training fix: take any model family, anchor large to small, and get embeddings that scale predictably.

10.03.2026 21:49 👍 0 🔁 1 💬 0 📌 0

GitHub - rohitsinghlab/plm_reverse_distillation: Reverse distillation to improve PLM scaling Reverse distillation to improve PLM scaling. Contribute to rohitsinghlab/plm_reverse_distillation development by creating an account on GitHub.

Try it out! Pre-trained reverse distilled ESM-2 models are on HuggingFace.

It's a 2-line code change to swap in reverse-distilled embeddings wherever you currently use ESM-2.

See the code fragment on Github:
github.com/rohitsinghla...

10.03.2026 21:49 👍 0 🔁 0 💬 1 📌 0

Reverse Distillation: Consistently Scaling Protein Language Model Representations Unlike the predictable scaling laws in natural language processing and computer vision, protein language models (PLMs) scale poorly: for many tasks, models within the same family plateau or even decre...

Led by two fantastic undergrads Darius Catrina and Christian Bepler, with @samsl.io

We'll be presenting this work at ICLR 2026.

Preprint: arxiv.org/abs/2603.07710

10.03.2026 21:49 👍 0 🔁 0 💬 1 📌 0

On ProteinGym DMS benchmarks, rd.15B is consistently the strongest performer. Think of the reverse-distilled models as removing the guesswork in selecting the best scale: no more wondering whether 650M or 3B or 15B is the right choice for your task.

10.03.2026 21:49 👍 0 🔁 0 💬 1 📌 0

Reverse distillation disentangles them via an orthogonal decomposition. We find the optimal linear decomposition that minimizes reconstruction error from small to large (SVD for the win, again!)

We did try non-linear decompositions but they didn't actually help.

10.03.2026 21:49 👍 0 🔁 0 💬 1 📌 0

Reverse distillation disentangles them via an orthogonal decomposition. We find the optimal linear decomposition that minimizes reconstruction error from small to large (SVD for the win, again!)

We did try non-linear decompositions but they didn't actually help.

10.03.2026 21:49 👍 0 🔁 0 💬 1 📌 0

We took a representation-learning approach to PLM scaling. The intuition is that small PLMs, constrained by capacity, are forced to encode the most universal protein features. Large models add specialized features but entangle them with the universal ones.

10.03.2026 21:49 👍 0 🔁 0 💬 1 📌 0

Reverse distillation gives PLM embeddings a Matryoshka (nested Russian dolls) structure: each larger model's representation exactly subsumes the smaller model's. Scale up and you only add information, never lose any features.

This makes for consistent PLM scaling.

10.03.2026 21:49 👍 0 🔁 0 💬 1 📌 0

Protein language models don't scale reliably: bigger doesn't always mean better for embeddings. What if what you need from ESM-15B is already inside ESM-8M?

Introducing Reverse Distillation, which flips the usual distillation script: small models anchor and decompose large ones

10.03.2026 21:49 👍 4 🔁 0 💬 1 📌 1

Excited about this! I'll focus on analyzing immune repertoires w PLMs.

Biological systems are inherently multi-scale. With the representational power and speed of PLMs, we can now bridge the molecular and systems scales, to study what makes each of us distinctive, immunity-wise.

12.06.2025 16:24 👍 2 🔁 0 💬 1 📌 0

Yes, we have lots of exciting collaborative projects at the interface of computation and biology. Deep expertise in many domains between our labs, so a wonderful and committed training environment!

09.05.2025 14:59 👍 1 🔁 1 💬 0 📌 0

Our fantastic trainees and collaborators made this possible. Kanchan Jha, Aditya Parekh and Pooja Parameswaran led the dry-lab work, while Daichi Shonai and Aki Uezu led the wet-lab work.

09.05.2025 14:40 👍 1 🔁 0 💬 0 📌 0

This was a wonderful collab with @scottsoderling.bsky.social , whose lab is situated next to ours.

If you want to do cool collaborative work like this, join us! We're building a great ecosystem of AIxBio at Duke.

09.05.2025 14:40 👍 1 🔁 1 💬 2 📌 0

I loved working on this project! Neither the kinase specificity prediction nor the proximity proteomics is enough on its own– you need both.

I think this project shows how a close collaboration between biologists and computer scientists can introduce entirely new capabilities.

09.05.2025 14:40 👍 0 🔁 0 💬 1 📌 0

With KolossuS, we studied sleepiness in mice, and especially the signaling impact of Sik3, a kinase whose mutation leads to sleepy mice.

We think our Kolossus + proteomics approach has a ton of potential in deconvolving kinases involved in specific processes. 12/

09.05.2025 14:40 👍 0 🔁 0 💬 1 📌 0

And of course, the interpretability of the kinase embedding space led to some fun explorations.

For example, we asked if the phylogeny of kinase families actually corresponds to substrate preferences? Broadly yes, but with a few key exceptions. 11/

09.05.2025 14:40 👍 0 🔁 0 💬 1 📌 0

KolossuS’ architecture applies broadly across all kinases (and generalizes to other species) and it is well-calibrated.

This, combined with a proximity proteomics that lets us assay in a tissue of interest and sub-cellular locale, gives us the end-to-end solution we need. 10/

09.05.2025 14:40 👍 0 🔁 0 💬 1 📌 0

A poorly calibrated model might always score one kinase highly even if, on a per-kinase basis, it is accurate on substrate specificities. See this note from the preprint: 9/

09.05.2025 14:40 👍 0 🔁 0 💬 1 📌 0

Breadth and interpretability are self-explanatory, but why emphasize calibration? And what does it even mean?

The key insight is that given a phosphorylated peptide, we’ll computationally screen it against every human kinase. Calibration is critical for that. 8/

09.05.2025 14:40 👍 0 🔁 0 💬 1 📌 0

Using the ESM-2 15B model (PLM scaling worked, for once!) we predict kinase-substrate specificity by learning a co-embedding of the two.

As models go, KolossuS is relatively simple. In its design, we emphasized three aspects: breadth, calibration and interpretability. 7/

09.05.2025 14:40 👍 0 🔁 0 💬 1 📌 0

The relevant assay here is phosphoproteomics: you can identify the phosphorylated peptides in a sample. Proximity proteomics will let you further target a specific tissue and sub-cellular neighborhood. But that doesn’t tell you which kinases are active.

Enter KolossuS.

09.05.2025 14:40 👍 0 🔁 0 💬 1 📌 0

Identifying the precise kinase involved in your pathway and tissue of interest therefore requires a mix of computation and experimentation. 5/

09.05.2025 14:40 👍 0 🔁 0 💬 1 📌 0

As writers, the specificity of kinases is only moderately high– multiple kinases can often phosphorylate a substrate. The moderate specificity makes it easier to have signal integration but it of course leads to disease risk. 4/

09.05.2025 14:40 👍 0 🔁 0 💬 1 📌 0

Kinases are the proto-example of something nature does often: take a simple biophysical phenomenon (here phosphorylation, but see also ubiquitination, acetylation etc.) and supercharge it as a signaling vehicle by evolving a spectrum of signal writers (e.g., kinases) and readers. 3/

09.05.2025 14:40 👍 1 🔁 0 💬 1 📌 0

Surprisingly little is known about kinases. Despite their therapeutic and biological importance, 80% of the human kinome is “dark” i.e. we don’t have a good sense of the substrates a kinase targets, and in which cell types or sub-cellular compartments. 2/

09.05.2025 14:40 👍 2 🔁 0 💬 1 📌 0

Deep Learning-coupled Proximity Proteomics to Deconvolve Kinase Signaling In Vivo Deconvolving the substrates of hundreds of kinases linked to phosphorylation networks driving cellular behavior is a fundamental, unresolved biological challenge, largely due to the poorly understood ...

Read on for the skeetorial. But if you're in a rush, here's the preprint:

www.biorxiv.org/content/10.1...

09.05.2025 14:40 👍 0 🔁 0 💬 1 📌 0

Introducing KolossuS to address a 50-year old problem: which kinases are active in your pathway of interest?

As computational biologists, our work mostly involves post-hoc analysis algorithms. KolossuS is the rare case where a ML model enables entirely new capabilities. 1/

09.05.2025 14:40 👍 13 🔁 2 💬 1 📌 0

The BEST part: This would not have been possible without the close (and SO FUN!) collaboration of my lab with @rohitsingh8080.bsky.social and the lab of Masashi Yanagisawa.

28.04.2025 22:00 👍 1 🔁 1 💬 0 📌 0

Over 50 yrs since the discovery of protein kinases, 80% of human kinases still have ≤20 known substrates, and many are “dark.” I'm EXCITED to announce our new work towards solving this- combining (1) deep learning with (2) proximity proteomics in vivo! ➡️
www.biorxiv.org/content/10.1...

28.04.2025 22:00 👍 34 🔁 8 💬 1 📌 0

We had a great morning session, including a keynote on single cells and long reads @aliciao.bsky.social, and talks on spatial transcriptomics
@rohitsingh8080.bsky.social, DNA storage, and TAD inference

25.04.2025 02:47 👍 12 🔁 6 💬 0 📌 0

Rohit Singh

Latest posts by Rohit Singh @rohitsingh8080