This plot is quite the indictment of fine-tuned PLMs, showing how performance is entirely data-dependent and, at the upper end of performance, equally achievable with randomized model weights
This plot is quite the indictment of fine-tuned PLMs, showing how performance is entirely data-dependent and, at the upper end of performance, equally achievable with randomized model weights
We made FLIP2, a protein fitness benchmark spanning seven new datasets, including enzymes, protein-protein interactions, and light-sensitive proteins, as well as splits that measure generalization relevant to real-world protein engineering campaigns.
This was a great team effort from @kdidi.bsky.social @sarahalamdari.bsky.social @alexijie.bsky.social Bruce Wittmann @kadinaj.bsky.social @avapamini.bsky.social @thisismadani.bsky.social Maya Czeneszew and @machine.learning.bio
Paper: www.biorxiv.org/content/10.6...
Website: flip.protein.properties
Interestingly, simpler models often matched or outperformed fine-tuned protein language models, challenging the utility of existing transfer learning techniques for fitness prediction.
We then evaluated zero‑shot protein language model (pLM) sequence‑likelihood scores, ridge‑regression baselines, and fine-tuned pLMs on these splits, confirming that the FLIP2 splits are more challenging than random splits with the same number of training examples.
We made FLIP2, a protein fitness benchmark spanning seven new datasets, including enzymes, protein-protein interactions, and light-sensitive proteins, as well as splits that measure generalization relevant to real-world protein engineering campaigns.
Train a protein language model to predict one homolog from another given the amount of evolutionary time separating them.
@antoinekoehl.bsky.social @junhaobearxiong.bsky.social
www.biorxiv.org/content/10.6...
🇹🇼 🇹🇼 🇹🇼
Excellent review on using generative models to design enzymes
@noeliaferruz.bsky.social @lassemiddendorf.bsky.social
arxiv.org/abs/2602.03779
Hot take: it's 2026 we don't need to be manually doing citations or asking llms to do them overleaf should just autogenerate the citations from dois
For no reason, I remembered today that I too once got to take a picture holding Nobel Prize that I didn't earn
I've had those but not sure I ever took a picture with one
For no reason, I remembered today that I too once got to take a picture holding Nobel Prize that I didn't earn
EDEN: a family of genomic language models trained on up to 9.7 trillion nucleotides from @basecamp-research.bsky.social's BaseData can design large serine recombinases, bridge recombinases, and antimicrobial peptides.
www.biorxiv.org/content/10.6...
Happy to have played a small part in this!
A dataset of 40 million protein families and an autoregressive model of protein families. Great to see other protein Atlases popping up after Dayhoff!!
@judewells.bsky.social @dmmiller597.bsky.social
www.biorxiv.org/content/10.6...
Finetune a codon-level language model with 30k tryptophan synthases, then generate diverse, functional, enzymes with broad substrate scopes.
Théophile Lambert @jsunn-y.bsky.social @francesarnold.bsky.social
www.biorxiv.org/content/10.1...
"Notably, curators struggled to locate consistent sequence annotations and performance values because the data were scattered across the main text, figures, and supplementary files. Conflicts were resolved by a third curator, who consulted the original figures and deposited the consensus record."
Enzyme Engineering Database (EnzEngDB): a platform for sharing and interpreting sequence–function relationships across protein engineering campaigns
@francescazfl.bsky.social @jsunn-y.bsky.social @francesarnold.bsky.social @arianemora.bsky.social
Paper: doi.org/10.1093/nar/...
DB: enzengdb.org
An energy-based model of protein conformational space can be used to predict structure from sequence, sample from the conformational landscape, rank structures, and predict mutation effects.
@sokrypton.org
www.biorxiv.org/content/10.6...
Becoming a real Asian by making my kid practice Chinese characters and math while he waits for his Saturday cello class.
Train a model to identify circularly-permuted structural homologs, then use it to discover novel pairs of related proteins!
@aidenosinetrip1 @abulnaga.bsky.social @sokrypton.org
www.biorxiv.org/content/10.1...
A benchmark that measures whether protein language models can structural similarity even with low sequence similarity.
A little shameless self-promotion: CARP does very well!
@Zinnia__Ma
www.biorxiv.org/content/10.1...
First Thanksgiving since 2021 that I haven't been sick!
Use quantitative in vitro mass spectrometry to measure temperature- and pH-induced aggregation for over 18,000 natural and de novo designed protein domains!
Cydney Martell @savaslab.bsky.social @grocklin.bsky.social
www.biorxiv.org/content/10.1...
You have until Dec 1 to apply to the bioml PhD research internship!
This is where you apply to work with me,
@alexijie.bsky.social @avapamini.bsky.social @lcrawford.bsky.social or Kristen Severson!
(new) link and some instructions below
apply.careers.microsoft.com/careers/job/...
Make sure you attach a research statement. To do this, go to your career profile, click on "Resume Manager" in the upper right, and upload your statement under "Other documents"
You have until Dec 1 to apply to the bioml PhD research internship!
This is where you apply to work with me,
@alexijie.bsky.social @avapamini.bsky.social @lcrawford.bsky.social or Kristen Severson!
(new) link and some instructions below
apply.careers.microsoft.com/careers/job/...
How spicy is too spicy for a work potluck? Asking for a friend