Kevin K. Yang 楊凱筌's Avatar

Kevin K. Yang 楊凱筌

@kevinkaichuang

Principal Researcher in BioML at Microsoft Research. He/him/他. 🇹🇼 yangkky.github.io

6,000
Followers
3,348
Following
379
Posts
23.08.2023
Joined
Posts Following

Latest posts by Kevin K. Yang 楊凱筌 @kevinkaichuang

This plot is quite the indictment of fine-tuned PLMs, showing how performance is entirely data-dependent and, at the upper end of performance, equally achievable with randomized model weights

26.02.2026 18:24 👍 13 🔁 2 💬 0 📌 0
Post image

We made FLIP2, a protein fitness benchmark spanning seven new datasets, including enzymes, protein-protein interactions, and light-sensitive proteins, as well as splits that measure generalization relevant to real-world protein engineering campaigns.

25.02.2026 21:25 👍 51 🔁 15 💬 1 📌 1

This was a great team effort from @kdidi.bsky.social @sarahalamdari.bsky.social @alexijie.bsky.social Bruce Wittmann @kadinaj.bsky.social @avapamini.bsky.social @thisismadani.bsky.social Maya Czeneszew and @machine.learning.bio

25.02.2026 21:25 👍 3 🔁 0 💬 0 📌 1

Paper: www.biorxiv.org/content/10.6...
Website: flip.protein.properties

25.02.2026 21:25 👍 3 🔁 0 💬 1 📌 0
Post image

Interestingly, simpler models often matched or outperformed fine-tuned protein language models, challenging the utility of existing transfer learning techniques for fitness prediction.

25.02.2026 21:25 👍 3 🔁 1 💬 1 📌 0
Post image

We then evaluated zero‑shot protein language model (pLM) sequence‑likelihood scores, ridge‑regression baselines, and fine-tuned pLMs on these splits, confirming that the FLIP2 splits are more challenging than random splits with the same number of training examples.

25.02.2026 21:25 👍 1 🔁 0 💬 1 📌 0
Post image

We made FLIP2, a protein fitness benchmark spanning seven new datasets, including enzymes, protein-protein interactions, and light-sensitive proteins, as well as splits that measure generalization relevant to real-world protein engineering campaigns.

25.02.2026 21:25 👍 51 🔁 15 💬 1 📌 1
Post image Post image Post image Post image

Train a protein language model to predict one homolog from another given the amount of evolutionary time separating them.

@antoinekoehl.bsky.social @junhaobearxiong.bsky.social

www.biorxiv.org/content/10.6...

24.02.2026 22:27 👍 12 🔁 0 💬 0 📌 0
Post image

🇹🇼 🇹🇼 🇹🇼

15.02.2026 02:04 👍 8 🔁 0 💬 0 📌 0
Post image

Excellent review on using generative models to design enzymes

@noeliaferruz.bsky.social @lassemiddendorf.bsky.social

arxiv.org/abs/2602.03779

04.02.2026 23:13 👍 11 🔁 2 💬 0 📌 0

Hot take: it's 2026 we don't need to be manually doing citations or asking llms to do them overleaf should just autogenerate the citations from dois

24.01.2026 01:51 👍 27 🔁 4 💬 2 📌 1
Post image

For no reason, I remembered today that I too once got to take a picture holding Nobel Prize that I didn't earn

16.01.2026 15:14 👍 121 🔁 9 💬 2 📌 1

I've had those but not sure I ever took a picture with one

16.01.2026 19:30 👍 1 🔁 0 💬 0 📌 0
Post image

For no reason, I remembered today that I too once got to take a picture holding Nobel Prize that I didn't earn

16.01.2026 15:14 👍 121 🔁 9 💬 2 📌 1
Post image Post image Post image Post image

EDEN: a family of genomic language models trained on up to 9.7 trillion nucleotides from @basecamp-research.bsky.social's BaseData can design large serine recombinases, bridge recombinases, and antimicrobial peptides.

www.biorxiv.org/content/10.6...

Happy to have played a small part in this!

13.01.2026 15:16 👍 18 🔁 5 💬 0 📌 0
Post image Post image Post image Post image

A dataset of 40 million protein families and an autoregressive model of protein families. Great to see other protein Atlases popping up after Dayhoff!!

@judewells.bsky.social @dmmiller597.bsky.social

www.biorxiv.org/content/10.6...

07.01.2026 23:07 👍 28 🔁 8 💬 0 📌 0
Post image Post image Post image

Finetune a codon-level language model with 30k tryptophan synthases, then generate diverse, functional, enzymes with broad substrate scopes.

Théophile Lambert @jsunn-y.bsky.social @francesarnold.bsky.social

www.biorxiv.org/content/10.1...

16.12.2025 00:25 👍 22 🔁 4 💬 0 📌 1

"Notably, curators struggled to locate consistent sequence annotations and performance values because the data were scattered across the main text, figures, and supplementary files. Conflicts were resolved by a third curator, who consulted the original figures and deposited the consensus record."

12.12.2025 20:56 👍 0 🔁 0 💬 0 📌 0
Post image

Enzyme Engineering Database (EnzEngDB): a platform for sharing and interpreting sequence–function relationships across protein engineering campaigns

@francescazfl.bsky.social @jsunn-y.bsky.social @francesarnold.bsky.social @arianemora.bsky.social

Paper: doi.org/10.1093/nar/...
DB: enzengdb.org

12.12.2025 20:56 👍 23 🔁 7 💬 1 📌 0
Post image

An energy-based model of protein conformational space can be used to predict structure from sequence, sample from the conformational landscape, rank structures, and predict mutation effects.

@sokrypton.org

www.biorxiv.org/content/10.6...

10.12.2025 23:17 👍 47 🔁 14 💬 0 📌 1

Becoming a real Asian by making my kid practice Chinese characters and math while he waits for his Saturday cello class.

06.12.2025 17:59 👍 11 🔁 0 💬 0 📌 0
Post image Post image Post image Post image

Train a model to identify circularly-permuted structural homologs, then use it to discover novel pairs of related proteins!

@aidenosinetrip1 @abulnaga.bsky.social @sokrypton.org

www.biorxiv.org/content/10.1...

05.12.2025 21:08 👍 21 🔁 3 💬 0 📌 0
Post image Post image

A benchmark that measures whether protein language models can structural similarity even with low sequence similarity.

A little shameless self-promotion: CARP does very well!

@Zinnia__Ma

www.biorxiv.org/content/10.1...

02.12.2025 22:11 👍 13 🔁 0 💬 0 📌 0
Preview
Sticky Rice Stuffing (Gluten-Free Thanksgiving Stuffing!) This crispy, gooey sticky rice stuffing is for adventurous Thanksgiving cooks or anyone looking for a gluten-free stuffing.

Yeah it's this recipe: thewoksoflife.com/sticky-rice-...

28.11.2025 14:05 👍 3 🔁 0 💬 0 📌 0
Post image Post image Post image

First Thanksgiving since 2021 that I haven't been sick!

28.11.2025 02:27 👍 13 🔁 0 💬 1 📌 0
Post image Post image

Use quantitative in vitro mass spectrometry to measure temperature- and pH-induced aggregation for over 18,000 natural and de novo designed protein domains!

Cydney Martell @savaslab.bsky.social @grocklin.bsky.social

www.biorxiv.org/content/10.1...

21.11.2025 21:05 👍 14 🔁 0 💬 1 📌 0
Preview
Research Intern - Machine Learning for Biology and Healthcare | Microsoft Careers Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world's best researchers, Research Interns learn, collaborate, and network for life. Researc...

You have until Dec 1 to apply to the bioml PhD research internship!

This is where you apply to work with me,
@alexijie.bsky.social @avapamini.bsky.social @lcrawford.bsky.social or Kristen Severson!

(new) link and some instructions below

apply.careers.microsoft.com/careers/job/...

20.11.2025 15:30 👍 5 🔁 3 💬 1 📌 2
Post image

Make sure you attach a research statement. To do this, go to your career profile, click on "Resume Manager" in the upper right, and upload your statement under "Other documents"

20.11.2025 15:30 👍 0 🔁 0 💬 0 📌 0
Preview
Research Intern - Machine Learning for Biology and Healthcare | Microsoft Careers Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world's best researchers, Research Interns learn, collaborate, and network for life. Researc...

You have until Dec 1 to apply to the bioml PhD research internship!

This is where you apply to work with me,
@alexijie.bsky.social @avapamini.bsky.social @lcrawford.bsky.social or Kristen Severson!

(new) link and some instructions below

apply.careers.microsoft.com/careers/job/...

20.11.2025 15:30 👍 5 🔁 3 💬 1 📌 2
Post image

How spicy is too spicy for a work potluck? Asking for a friend

20.11.2025 11:55 👍 9 🔁 1 💬 3 📌 0