We have started a project trying to predic the interactions/structures of all yeast protein pairs using an AlphaFold pooling approach. We are making the current dataset open and we welcome collaborations.
www.evocellnet.com/2026/03/mapp...
We have started a project trying to predic the interactions/structures of all yeast protein pairs using an AlphaFold pooling approach. We are making the current dataset open and we welcome collaborations.
www.evocellnet.com/2026/03/mapp...
Can we simulate realistic evolutionary trajectories and βreplay the tape of lifeβ? In this work, we propose a flexible, generalizable deep learning framework for modeling how the entire protein sequence evolves over time while capturing complex interactions across sites. 1/n
doi.org/10.64898/202...
π from the Nexus
I still haven't built up my network here so my following patterns are a narrow slice of my interests.
Can proteins fold and function with half of the amino acid alphabet?
Using only 10 residues, we designed stable, mutation-resilient structuresβno aromatics or basics involved.
A minimalist foundation for ancient biology and synthetic design. tinyurl.com/37t8br4v
#ProteinDesign #OriginsOfLife
Mingchen replied to me on Twitter that it's also on bioRxiv now www.biorxiv.org/content/10.6...
My time in @martinsteinegger.bsky.social's group is ending, but Iβm staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org
I'm really excited to break up the holiday relaxation time with a new preprint that benchmarks AlphaFold3 (AF3)/βco-foldingβ methods with 2 new stringent performance tests.
Thread below - but first some links:
A longer take:
fraserlab.com/2025/12/29/k...
Preprint:
www.biorxiv.org/content/10.6...
New preprintπ¨
Imagine (re)designing a protein via inverse folding. AF2 predicts the designed sequence to a structure with pLDDT 94 & you get 1.8 Γ
RMSD to the input. Perfect design?
What if I told u that the structure has 4 solvent-exposed Trp and 3 Pro where a Gly should be?
Why to be waryπ§΅π
Cody also put in a ton of extra work to make the code organized and usable in the GitHub repo: github.com/Anantharaman...
It links to a Colab notebook for model inference, training data, and pretrained models.
Excited for our new paper on a genome language model for viruses in @natcomms.nature.com: "Protein Set Transformer: a protein-based genome language model to power high-diversity viromics"! Led by PhD student Cody Martin in collaboration with @anthonygitter.bsky.social
doi.org/10.1038/s414...
Thanks, I didn't realize Rogue Scholar minted DOIs
Use @prereview.bsky.social for preprints and something else for other manuscripts?
What are good places to post an unsolicited manuscript peer review these days? I don't have a blog. I read manuscripts across arXiv, bioRxiv, ChemRxiv, OpenReview, random white papers, journals, etc. Do I dump it on Zenodo, post it here, and send it to the authors?
Our Assay2Mol manuscript was published at EMNLP 2025 doi.org/10.18653/v1/...
See the preprint thread below for a summary of the methodology, results, and code. We added more control experiments in this version related to protein sequence identity and generated molecule size.
@hkws.bsky.social and I are creating the Madison AI for Proteins (MAIP) group to discuss early-stage research at monthly meetups, share computational resources, and grow this local community. Visit mad-ai-proteins.github.io to sign up for announcements and watch for our 2026 events.
This looks like a fantastic resource to study human kinase signalling. So much MS instrument time.
Something fun and sciencey is coming soon to Madison
Looks very interesting. Can I think of this like a more extreme form of the evotuning from UniRep or doi.org/10.1101/2024... except it uses one sequence instead of the sequence plus homologs?
Bioconductor R package: bioconductor.org/packages/MPAC
Shiny app to explore results in manuscript: connect.doit.wisc.edu/content/122/
MPAC uses PARADIGM as the probabilistic model but makes many improvements:
- data-driven omic data discretization
- permutation testing to eliminate spurious predictions
- full workflow and downstream analyses in an R package
- Shiny app for interactive visualization
Overview of the MPAC workflow. MPAC calculates inferred pathway levels (IPLs) from real and permuted CNA and RNA data. It filters real IPLs using the permuted IPLs to remove spurious IPLs. Then, MPAC focuses on the largest pathway subset network with filtered IPLs to compute GO term enrichment, predict patient groups, and identify key group-specific proteins.
The journal version of our Multi-omic Pathway Analysis of Cells (MPAC) software is now out: doi.org/10.1093/bioi...
MPAC uses biological pathway graphs to model DNA copy number and gene expression changes and infer activity states of all pathway members.
I found out that Neurosnap offers ESMFold via API neurosnap.ai/service/ESMF...
I may test how many calls are possible with the free academic plan to see if it is worthwhile to update my repo.
AI + physics for protein engineering π
Our collaboration with @anthonygitter.bsky.social is out in Nature Methods! We use synthetic data from molecular modeling to pretrain protein language models. Congrats to Sam Gelman and the team!
π www.nature.com/articles/s41...
Does anyone know whether there's a functioning API to ESMfold?
(api.esmatlas.com/foldSequence... gives me Service Temporarily Unavailable)
The main GitHub repo github.com/gitter-lab/m... links to the extensive resources for running Rosetta simulations at scale to generate new training data, training METL models, running our models, and accessing our datasets. 8/
Fig. 6: Low-N GFP design.
We can use METL for low-N protein design. We trained METL on Rosetta simulations of GFP biophysical attributes and only 64 experimental examples of GFP brightness. It designed fluorescent 5 and 10 mutants, including some with mutants entirely outside training set mutations. 7/
Fig. 5: Function-specific simulations improve METL pretraining for GB1.
A powerful aspect of pretraining on biophysical simulations is that the simulations can be customized to match the protein function and experimental assay. Our expanded simulations of the GB1-IgG complex with Rosetta InterfaceAnalyzer improve METL predictions of GB1 binding. 6/
Fig. 3: Comparative performance across extrapolation tasks.
We also benchmark METL on four types of difficult extrapolation. For instance, positional extrapolation provides training data from some sequence positions and tests predictions at different sequence positions. Linear regression completely fails in this setting. 5/
Fig. 2: Comparative performance of Linear, Rosetta total score, EVE, RaSP, Linear-EVE, ESM-2, ProteinNPT, METL-Global and METL-Local across different training set sizes.
We compare these approaches on deep mutational scanning datasets with increasing training set sizes. Biophysical pretraining helps METL generalize well with small training sets. However, augmented linear regression with EVE scores is great on some of these assays. 4/
METL models pretrained on Rosetta biophysical attributes learn different protein representations than general protein language models like ESM-2 or protein family-specific models like EVE. These new representations are valuable for machine learning-guided protein engineering. 3/