Anthony Gitter (@anthonygitter)

Mapping the yeast atructural interactome with AlphaFold3: an open call for collaboration We are excited to announce the early-stage release of our S. cerevisiae structural interactome mapping project. Using AlphaFold3 (AF3), w...

We have started a project trying to predic the interactions/structures of all yeast protein pairs using an AlphaFold pooling approach. We are making the current dataset open and we welcome collaborations.
www.evocellnet.com/2026/03/mapp...

04.03.2026 10:36 👍 97 🔁 52 💬 6 📌 0

Can we simulate realistic evolutionary trajectories and “replay the tape of life”? In this work, we propose a flexible, generalizable deep learning framework for modeling how the entire protein sequence evolves over time while capturing complex interactions across sites. 1/n
doi.org/10.64898/202...

21.02.2026 17:13 👍 83 🔁 35 💬 3 📌 1

👋 from the Nexus

I still haven't built up my network here so my following patterns are a narrow slice of my interests.

12.02.2026 14:31 👍 1 🔁 0 💬 1 📌 0

Ancient amino acid sets enable stable protein folds Early proteins likely arose from a chemically limited set of amino acids available through prebiotic chemistry, raising a central question in molecular evolution: could such primitive compositions yie...

Can proteins fold and function with half of the amino acid alphabet?
Using only 10 residues, we designed stable, mutation-resilient structures—no aromatics or basics involved.
A minimalist foundation for ancient biology and synthetic design. tinyurl.com/37t8br4v
#ProteinDesign #OriginsOfLife

03.11.2025 16:48 👍 25 🔁 11 💬 1 📌 0

Mingchen replied to me on Twitter that it's also on bioRxiv now www.biorxiv.org/content/10.6...

23.01.2026 15:41 👍 2 🔁 0 💬 0 📌 0

Mirdita Lab - Laboratory for Computational Biology & Molecular Machine Learning Mirdita Lab builds scalable bioinformatics methods.

My time in @martinsteinegger.bsky.social's group is ending, but I’m staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org

20.01.2026 11:07 👍 104 🔁 55 💬 7 📌 1

Know when to co-fold'em This is the official web page for the James Fraser Lab at UCSF.

I'm really excited to break up the holiday relaxation time with a new preprint that benchmarks AlphaFold3 (AF3)/“co-folding” methods with 2 new stringent performance tests.

Thread below - but first some links:
A longer take:
fraserlab.com/2025/12/29/k...

Preprint:
www.biorxiv.org/content/10.6...

29.12.2025 22:25 👍 72 🔁 30 💬 5 📌 2

New preprint🚨
Imagine (re)designing a protein via inverse folding. AF2 predicts the designed sequence to a structure with pLDDT 94 & you get 1.8 Å RMSD to the input. Perfect design?
What if I told u that the structure has 4 solvent-exposed Trp and 3 Pro where a Gly should be?

Why to be wary🧵👇

16.12.2025 15:15 👍 58 🔁 22 💬 4 📌 1

GitHub - AnantharamanLab/protein_set_transformer: Protein Set Transformer (PST) framework for training protein-language-model-based genome language models. Inference is possible for viral genomes usin... Protein Set Transformer (PST) framework for training protein-language-model-based genome language models. Inference is possible for viral genomes using our pretrained viral foundation model. - Anan...

Cody also put in a ton of extra work to make the code organized and usable in the GitHub repo: github.com/Anantharaman...

It links to a Colab notebook for model inference, training data, and pretrained models.

15.12.2025 21:55 👍 1 🔁 1 💬 0 📌 0

Protein Set Transformer: a protein-based genome language model to power high-diversity viromics - Nature Communications A genome language model, Protein Set Transformer, trained on viral datasets, uncovers evolutionary rules of protein content and organization driving precise virus identification, host prediction, and ...

Excited for our new paper on a genome language model for viruses in @natcomms.nature.com: "Protein Set Transformer: a protein-based genome language model to power high-diversity viromics"! Led by PhD student Cody Martin in collaboration with @anthonygitter.bsky.social

doi.org/10.1038/s414...

15.12.2025 18:40 👍 10 🔁 4 💬 1 📌 0

Thanks, I didn't realize Rogue Scholar minted DOIs

12.12.2025 20:22 👍 0 🔁 0 💬 0 📌 0

Use @prereview.bsky.social for preprints and something else for other manuscripts?

12.12.2025 16:49 👍 1 🔁 0 💬 0 📌 0

What are good places to post an unsolicited manuscript peer review these days? I don't have a blog. I read manuscripts across arXiv, bioRxiv, ChemRxiv, OpenReview, random white papers, journals, etc. Do I dump it on Zenodo, post it here, and send it to the authors?

12.12.2025 16:49 👍 2 🔁 1 💬 2 📌 0

Our Assay2Mol manuscript was published at EMNLP 2025 doi.org/10.18653/v1/...

See the preprint thread below for a summary of the methodology, results, and code. We added more control experiments in this version related to protein sequence identity and generated molecule size.

21.11.2025 15:19 👍 0 🔁 0 💬 0 📌 0

@hkws.bsky.social and I are creating the Madison AI for Proteins (MAIP) group to discuss early-stage research at monthly meetups, share computational resources, and grow this local community. Visit mad-ai-proteins.github.io to sign up for announcements and watch for our 2026 events.

20.11.2025 16:29 👍 1 🔁 0 💬 0 📌 0

This looks like a fantastic resource to study human kinase signalling. So much MS instrument time.

19.11.2025 06:18 👍 13 🔁 3 💬 0 📌 0

Something fun and sciencey is coming soon to Madison

14.11.2025 17:20 👍 0 🔁 0 💬 0 📌 1

Protein Language Model Fitness Is a Matter of Preference Leveraging billions of years of evolution, scientists have trained protein language models (pLMs) to understand the sequence and structure space of proteins aiding in the design of more functional pro...

Looks very interesting. Can I think of this like a more extreme form of the evotuning from UniRep or doi.org/10.1101/2024... except it uses one sequence instead of the sequence plus homologs?

23.10.2025 22:23 👍 3 🔁 0 💬 1 📌 0

MPAC Multi-omic Pathway Analysis of Cells (MPAC), integrates multi-omic data for understanding cellular mechanisms. It predicts novel patient groups with distinct pathway profiles as well as identifying ke...

Bioconductor R package: bioconductor.org/packages/MPAC

Shiny app to explore results in manuscript: connect.doit.wisc.edu/content/122/

10.10.2025 14:56 👍 0 🔁 0 💬 0 📌 0

MPAC uses PARADIGM as the probabilistic model but makes many improvements:
- data-driven omic data discretization
- permutation testing to eliminate spurious predictions
- full workflow and downstream analyses in an R package
- Shiny app for interactive visualization

10.10.2025 14:56 👍 0 🔁 0 💬 1 📌 0

Overview of the MPAC workflow. MPAC calculates inferred pathway levels (IPLs) from real and permuted CNA and RNA data. It filters real IPLs using the permuted IPLs to remove spurious IPLs. Then, MPAC focuses on the largest pathway subset network with filtered IPLs to compute GO term enrichment, predict patient groups, and identify key group-specific proteins.

The journal version of our Multi-omic Pathway Analysis of Cells (MPAC) software is now out: doi.org/10.1093/bioi...

MPAC uses biological pathway graphs to model DNA copy number and gene expression changes and infer activity states of all pathway members.

10.10.2025 14:56 👍 2 🔁 1 💬 1 📌 0

🧬 Use ESMFold Online | Neurosnap Bulk protein structure prediction model that only requires a single amino acid sequence as input. Much faster than AlphaFold2 since no MSAs are required (but slightly less accurate too).

I found out that Neurosnap offers ESMFold via API neurosnap.ai/service/ESMF...

I may test how many calls are possible with the free academic plan to see if it is worthwhile to update my repo.

09.10.2025 02:25 👍 2 🔁 1 💬 0 📌 0

Biophysics-based protein language models for protein engineering - Nature Methods Mutational effect transfer learning (METL) is a protein language model framework that unites machine learning and biophysical modeling. Transformer-based neural networks are pretrained on biophysical simulation data to capture fundamental relationships between protein sequence, structure and energetics.

AI + physics for protein engineering 🚀
Our collaboration with @anthonygitter.bsky.social is out in Nature Methods! We use synthetic data from molecular modeling to pretrain protein language models. Congrats to Sam Gelman and the team!
🔗 www.nature.com/articles/s41...

01.10.2025 19:07 👍 5 🔁 1 💬 0 📌 0

Does anyone know whether there's a functioning API to ESMfold?

(api.esmatlas.com/foldSequence... gives me Service Temporarily Unavailable)

30.09.2025 14:11 👍 3 🔁 1 💬 2 📌 0

GitHub - gitter-lab/metl: Mutational Effect Transfer Learning (METL) framework for pretraining and finetuning biophysics-informed protein language models Mutational Effect Transfer Learning (METL) framework for pretraining and finetuning biophysics-informed protein language models - gitter-lab/metl

The main GitHub repo github.com/gitter-lab/m... links to the extensive resources for running Rosetta simulations at scale to generate new training data, training METL models, running our models, and accessing our datasets. 8/

11.09.2025 17:00 👍 0 🔁 0 💬 0 📌 0

Fig. 6: Low-N GFP design.

We can use METL for low-N protein design. We trained METL on Rosetta simulations of GFP biophysical attributes and only 64 experimental examples of GFP brightness. It designed fluorescent 5 and 10 mutants, including some with mutants entirely outside training set mutations. 7/

11.09.2025 17:00 👍 0 🔁 0 💬 1 📌 0

Fig. 5: Function-specific simulations improve METL pretraining for GB1.

A powerful aspect of pretraining on biophysical simulations is that the simulations can be customized to match the protein function and experimental assay. Our expanded simulations of the GB1-IgG complex with Rosetta InterfaceAnalyzer improve METL predictions of GB1 binding. 6/

11.09.2025 17:00 👍 0 🔁 0 💬 1 📌 0

Fig. 3: Comparative performance across extrapolation tasks.

We also benchmark METL on four types of difficult extrapolation. For instance, positional extrapolation provides training data from some sequence positions and tests predictions at different sequence positions. Linear regression completely fails in this setting. 5/

11.09.2025 17:00 👍 0 🔁 0 💬 1 📌 0

Fig. 2: Comparative performance of Linear, Rosetta total score, EVE, RaSP, Linear-EVE, ESM-2, ProteinNPT, METL-Global and METL-Local across different training set sizes.

We compare these approaches on deep mutational scanning datasets with increasing training set sizes. Biophysical pretraining helps METL generalize well with small training sets. However, augmented linear regression with EVE scores is great on some of these assays. 4/

11.09.2025 17:00 👍 0 🔁 0 💬 1 📌 0

METL models pretrained on Rosetta biophysical attributes learn different protein representations than general protein language models like ESM-2 or protein family-specific models like EVE. These new representations are valuable for machine learning-guided protein engineering. 3/

11.09.2025 17:00 👍 1 🔁 0 💬 1 📌 0

Anthony Gitter

Latest posts by Anthony Gitter @anthonygitter