Yun S. Song (@yun-s-song)

Thanks, Noah.

22.02.2026 17:37 👍 0 🔁 0 💬 0 📌 0

This work was a team effort, spearheaded by Antoine Koehl and Sebastian Prillo, with significant contributions from former undergraduate students Matthew Liu & Lillian Weng and current graduate student Bear Xiong. Many thanks to Dave Savage for help and advice with the carbonic anhydrase work. 11/n

21.02.2026 17:13 👍 1 🔁 0 💬 0 📌 0

Finally, PEINT is a strong predictor of variant effects. Since PEINT leverages a pretrained pLM in its encoder, one can view it as a general and principled framework to better align pre-trained pLMs with evolutionary information and thereby improve their performance on VEP. 10/n

21.02.2026 17:13 👍 2 🔁 0 💬 1 📌 0

To test whether PEINT can generate functional proteins, we simulated the evolution of carbonic anhydrase along a tree. Despite sharing only ~40% sequence identity with any known carbonic anhydrase sequence, many of our simulated sequences retain function as measured in vivo and in vitro. 9/n

21.02.2026 17:13 👍 1 🔁 0 💬 1 📌 0

We can extend this simulation procedure across entire trees, generating evolutionary trajectories that are virtually indistinguishable from natural evolution and far surpass the capabilities of classical evolutionary models as simulation engines. 8/n

21.02.2026 17:13 👍 3 🔁 0 💬 1 📌 0

In addition, PEINT truly shines in simulating realistic evolution, where time is a dial we can use to generate new sequences with defined mutational loads. PEINT can generate structurally coherent sequences, even at large evolutionary distances, while classical models struggle. 7/n

21.02.2026 17:13 👍 1 🔁 0 💬 1 📌 0

We find that PEINT excels at retrospective evolutionary tasks, including likelihoods on held-out data, exhibiting substantial improvements over classical models, and estimation of divergence times separating a pair of unaligned sequences. 6/n

21.02.2026 17:13 👍 2 🔁 0 💬 1 📌 0

Importantly, PEINT learns insertion-deletion dynamics directly from raw, unaligned sequences, thereby eliminating potential biases from alignment errors that can lead to incorrect inference of evolutionary patterns. 5/n

21.02.2026 17:13 👍 3 🔁 0 💬 1 📌 0

CherryML: scalable maximum likelihood estimation of phylogenetic models - Nature Methods CherryML is a method to scale up maximum likelihood estimation for general phylogenetic models of molecular evolution, providing several orders of magnitude speedup over traditional methods.

Building on prior work from our lab, CherryML, we show how to create a new class of models that inherits the best of both worlds. This framework, which we call PEINT (Protein Evolution IN Time), learns time-dependent evolutionary trajectories. 4/n

CherryML: www.nature.com/articles/s41...

21.02.2026 17:13 👍 2 🔁 0 💬 1 📌 0

On the other hand, while pLMs use deep learning architectures that excel at modeling interactions between sites, they essentially treat proteins as being independently and identically distributed, leading to “phylogenetic biases” and lacking sophisticated evolutionary reasoning. 3/n

21.02.2026 17:13 👍 3 🔁 0 💬 1 📌 0

Our work unifies two historically disparate fields: phylogenetic models & protein language models (pLMs). Classical phylogenetic models provide a rigorous treatment of time and quantify the effects of mutation and selection, but assume independence across sites and require pre-aligned sequences. 2/n

21.02.2026 17:13 👍 2 🔁 0 💬 1 📌 0

Can we simulate realistic evolutionary trajectories and “replay the tape of life”? In this work, we propose a flexible, generalizable deep learning framework for modeling how the entire protein sequence evolves over time while capturing complex interactions across sites. 1/n
doi.org/10.64898/202...

21.02.2026 17:13 👍 83 🔁 35 💬 3 📌 1

Published online on Jan 2, 2025 and just appeared in the December 2025 issue!

19.12.2025 03:02 👍 19 🔁 4 💬 0 📌 0

Home - ProbGen 2026 Your Site Description

The registration deadline is fast approaching for probgen 2026! Abstracts due by January 15, registration by January 31

probgen2026.github.io

18.12.2025 17:09 👍 16 🔁 18 💬 1 📌 0

Replaying evolution to learn about the fitness landscape of affinity maturation A five year collaboration with the Victora lab is bearing fruit for evolutionary biology.

Over the past 5+ years I've had the honor of working with @wsdewitt.github.io @victora.bsky.social and many others on a project to "replay" affinity maturation evolution from a fixed starting point.

matsen.group/general/2025...

11.12.2025 17:36 👍 28 🔁 17 💬 2 📌 1

SMBE2026 Symposium 10 | Learning from evolution: AI models for genomic function Organisers - Shu Zhang — Gladstone Institutes & UCSF, USA (Female) Invited Speaker - Yun S. Song — University of California, Berkeley, USA (Male)

Organisers
- Shu Zhang | @gladstoneinst.bsky.social

Invited Speaker
- @yun-s-song.bsky.social | @ucberkeleyofficial.bsky.social

09.12.2025 08:28 👍 3 🔁 1 💬 0 📌 0

Rapid compensatory evolution within a multiprotein complex preserves telomere integrity Intragenomic conflict with selfish genetic elements spurs adaptive changes in subunits of essential multiprotein complexes. Whether and how these adaptive changes disrupt interactions within such comp...

How to keep in step when your (protein) partner speeds up…

Here we investigated the adaptive remodeling of a protein-protein interaction surface essential for telomere protection.

Congrats to whole team!

www.science.org/doi/10.1126/...

28.11.2025 17:22 👍 120 🔁 64 💬 6 📌 4

PNAS Proceedings of the National Academy of Sciences (PNAS), a peer reviewed journal of the National Academy of Sciences (NAS) - an authoritative source of high-impact, original research that broadly spans...

The last work of my PhD is finally out: www.pnas.org/doi/10.1073/...! This work is about accurately estimating branch length in the Ancestral Recombination Graph (ARG), which is achieved by a really simple framework with minimal assumptions. (1/n)

25.11.2025 20:27 👍 57 🔁 20 💬 1 📌 2

Thank you for your kind words, Josh.

These days, quite a few students prefer to take photos of the board using their phones. I have no problem with it as long as I am not in the picture.

15.11.2025 18:31 👍 5 🔁 0 💬 0 📌 0

Assistant/Associate/Full Professor – Engineering + Artificial Intelligence - College of Engineering (host academic department(s) to be determined) University of California, Berkeley is hiring. Apply now!

An open-rank faculty search in AI + Engineering (Bioengineering included) at UC Berkeley.

Due date: Monday, Nov 3, 2025 at 11:59pm (PT)
Please help spread the news.

aprecruit.berkeley.edu/JPF05144

14.10.2025 21:52 👍 6 🔁 3 💬 1 📌 0

Not yet, but we will surely generate bp-resolution genome-wide scores for all six species studied in the paper and make them publicly available. For now, we have predictions for ~10M variants used in the S-LDSC analysis in humans.

22.09.2025 14:59 👍 3 🔁 0 💬 0 📌 0

This is truly an incredible breakthrough IMO. Really exemplifies what you get when deep domain expertise (popgen/evolution/disease genetics in this case) fuses with cleverly crafted ML. What u get r sleek, well thought out architectures that absolutely destroy the behemoths. Wow!! 1/

22.09.2025 08:34 👍 59 🔁 14 💬 1 📌 1

All in all, we believe that GPN-Star offers a scalable & flexible approach for training effective gLMs.

This work was led by my talented students @czye.bsky.social and @gonzalobenegas.bsky.social, with contributions from other lab members, @peterdfields.bsky.social at Jax, & B. Clarke at DKFZ
(n/n)

22.09.2025 05:29 👍 4 🔁 1 💬 1 📌 0

GitHub - songlab-cal/gpn: Genomic Pre-trained Network Genomic Pre-trained Network. Contribute to songlab-cal/gpn development by creating an account on GitHub.

Upon publication, we will release base-resolution predictions for the human genome and the five model organisms.
Codes to train the model, run inference, and reproduce the analyses are available on GitHub (github.com/songlab-cal/...) and Hugging Face (tinyurl.com/nhhcppvm).
(9/n)

22.09.2025 05:29 👍 9 🔁 0 💬 1 📌 0

To show that GPN-Star is a robust and generalizable framework that can advance biology beyond human genetics, we apply it to train gLMs for five well-studied model organisms and demonstrate their effectiveness in assessing variant effects in these species.
(8/n)

22.09.2025 05:29 👍 4 🔁 0 💬 1 📌 0

In addition, GPN-Star exhibits meaningful nucleotide dependencies that align with known functional dependencies, indicating its potential to help understand genomic syntax. This represents a notable advance over traditional conservation scores.
(7/n)

22.09.2025 05:29 👍 7 🔁 0 💬 1 📌 0

By training GPN-Star on vertebrate, mammal, and primate alignments, we reveal task-dependent advantages of modeling deeper versus more recent evolution. These findings offer new biological insights and practical guidance for developing future gLMs and evolutionary models.
(6/n)

22.09.2025 05:29 👍 4 🔁 2 💬 1 📌 0

GPN-Star achieves unprecedented SNP heritability enrichments across over 100 human complex traits. Moreover, we devise a simple approach to incorporate tissue-specificity into the model prediction and show that it further improves heritability enrichment.
(5/n)

22.09.2025 05:29 👍 4 🔁 0 💬 1 📌 0

We compare GPN-Star with several models, including the recent AlphaGenome and Evo2 models with up to 1Mb context size and 40B parameters, and observe that GPN-Star consistently ranks at the top across a wide range of human variant effect prediction tasks.
(4/n)

22.09.2025 05:29 👍 3 🔁 0 💬 1 📌 0

We also introduce a calibration method that removes the confounding effect of mutation rate variation from gLM predictions for the first time. This improves downstream performance and enables a more direct interpretation of model scores as estimates of selective constraint.
(3/n)

22.09.2025 05:29 👍 5 🔁 1 💬 1 📌 0

Yun S. Song

Latest posts by Yun S. Song @yun-s-song