Nick Boyd's Avatar

Nick Boyd

@nboyd

optimization, inverse problems, also proteins. ml at escalante. formerly: atomicai, xgenomes, broad, berkeley.

167
Followers
241
Following
56
Posts
08.09.2024
Joined
Posts Following

Latest posts by Nick Boyd @nboyd

Video thumbnail

sometimes I wonder if Claude Code really does make me more productive. sure, it implemented this much faster than I could have, but I probably would have had the sense not to...

03.03.2026 20:32 πŸ‘ 20 πŸ” 1 πŸ’¬ 4 πŸ“Œ 0
Preview
mosaic/examples/protenij_vhh.py at main Β· escalante-bio/mosaic composite-objective protein design. Contribute to escalante-bio/mosaic development by creating an account on GitHub.

I've been testing this model a bit for design: github.com/escalante-bi... . Seems to work very well in general. For VHH could probably use higher PLM weight or something to better constrain the CDRs; for globular binders results look good.

28.02.2026 15:11 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Dictionary Index mmcif_pdbx.dic PDBx/mmCIF Data Dictionary Dictionary Index mmcif_pdbx.dic

every time I have to revisit the mmcif spec I want to cry: mmcif.wwpdb.org/dictionaries...

26.02.2026 18:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I wonder if an architecture where the entire model participated in diffusion (inc triangle layers) would work better for these applications. Obvious computational efficiency reasons not to do this, but it sometimes seems the trunk has completely made up its mind before diffusion...

21.02.2026 00:07 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Representative results for a single target

Representative results for a single target

this is from a hyperparameter sweep with 10 benchmark targets using roughly this code: gist.github.com/nboyd/8e4f32.... I haven't actually run BindCraft; could be mosaic/protenix-specific. Also, these binders *are* different even if they have similar iptms; in vitro results might be worse

18.02.2026 19:50 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

sad this .gif doesn't constitute reproducible scientific truth

18.02.2026 19:46 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

if you don't like the huge extended helices or alpha solenoid proteins you're getting from hallucination-based protein design methods (bindcraft, mosaic, etc), increasing the scale of the initial sequence noise (typically Gumbel) increases funkiness without hurting final metrics like ipTM

17.02.2026 21:24 πŸ‘ 21 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Post image

Excellent, comprehensive rundown of the state of bio lab automation by @owlposting1.bsky.social

In retrospect, it's an important topic that has had almost zero discussion over the years!

A fun surprise to see some decade-old(!) work show up in there too.

www.owlposting.com/p/heuristics...

09.02.2026 17:02 πŸ‘ 6 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

Naturally I had to add this to mosaic. Here's a VHH designed using `protenix_base_20250630_v1.0.0`. Example notebook here: github.com/escalante-bi...

06.02.2026 16:51 πŸ‘ 4 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Preview
GitHub - bytedance/Protenix: Toward High-Accuracy Open-Source Biomolecular Structure Prediction. Toward High-Accuracy Open-Source Biomolecular Structure Prediction. - bytedance/Protenix

Protenix v1.0 is out with some very impressive performance numbers (exceeding AF3 performance on protein-protein complexes)

06.02.2026 16:46 πŸ‘ 8 πŸ” 3 πŸ’¬ 1 πŸ“Œ 1

Obviously these models aren't perfect and are trained on finite data, the data generating distribution doesn't really exist, there are better ways to control generative models, etc etc etc. This is still often a surprisingly illuminating way to think about these models.

30.01.2026 16:45 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Fun fact: non-generative models instead produce (in theory!) \argmin_x E [loss(x)], where the expectation is over p(x | c). This is why AF2 produces spaghetti and AF3 hallucinates helices.

30.01.2026 16:45 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Filtering criteria for PPIFlow

Filtering criteria for PPIFlow

A more recent example is PPIFlow, which trains on complexes that are probably strong binders rather than all pairs of proteins in PDB (which might not bind tightly!). Or, the β€œubiquitin memorization” issue with BoltzGen: if you see a protein of length 76 in PDB, it’s almost certainly ubiquitin.

30.01.2026 16:45 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

There are many examples of this in bio + ML: soluble proteinMPNN shifts the distribution of proteinMPNN by training only on soluble structures.

30.01.2026 16:45 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

e.g., AF3 roughly answers the question β€œif you saw a protein with this sequence and MSA in PDB, what kinds of structures would it fold into?”

A corollary is you can control your generative model’s output by filtering the training data.

30.01.2026 16:45 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

If everything goes well, these models sample from p(x | c), where the joint distribution is the data generating distribution of the training set. i.e., they answer the question: β€œif you saw this conditioning information in your training set, what kinds of data would it be attached to?”

30.01.2026 16:45 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Developability comes for free...? Did AI de novo antibody generation models learn developability properties without explicitly being trained on this?

Love this theory for why antibody generative models produce sequences with nice developability properties: it’s a reflection of PDB. ayusuf.substack.com/p/developabi....

This is a very nice example of one way of thinking about conditional generative models.

30.01.2026 16:45 πŸ‘ 8 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

Thanks Martin!!

22.01.2026 22:45 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
More permissive license? Β· Issue #5 Β· Mingchenchen/PPIFlow Hi, Congratulations on these results and paper; this is a really elegant approach! Would you consider switching to a more standard license, for instance MIT? I'd like to use this in some open sourc...

Potential fork of PPIFlow with a more permissive license is very exciting: github.com/Mingchenchen...
Highly recommend their paper: results are super impressive and it's *not* an AF3 clone

22.01.2026 21:44 πŸ‘ 8 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Main_SI1_High-Affinity Protein Binder Design via Flow Matching and In Silico Maturation.pdf

drive.google.com/file/d/1KTPO...

21.01.2026 20:33 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

I’ve got a physical copy πŸ˜…

21.01.2026 20:19 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

JAX projects are more modular in my experience: it's sometimes really hard to get two torch projects to install in the same environment let alone interoperate nicely

probably I did too much functional programming + Julia in my formative years

13.01.2026 17:16 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

speed/JIT/parallelization are really nice but it’s mostly a style thing for me. I find most large torch projects incomprehensible: lots of OO/imperative code/manual batching etc + frameworks like lightning/omegaconf. I can't go back to life before vmap & other higher-order functions.

13.01.2026 17:14 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

inspired by @delalamo.xyz

13.01.2026 15:01 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
GitHub - nboyd/jigandmpnn: Fully vibe-coded translation of ligand/soluble/etc+proteinMPNN to JAX/eqx Fully vibe-coded translation of ligand/soluble/etc+proteinMPNN to JAX/eqx - nboyd/jigandmpnn

vibe translated ligand/protein/soluble-mpnn from PyTorch to JAX. not sure if this works, but it was pretty fun and took 45 minutes of my time. Claude Code is going to make my virtuous no-torch lifestyle a lot easier... github.com/nboyd/jigand...

13.01.2026 14:58 πŸ‘ 4 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Preview
Building antibodies blindfolded: the paradox of de novo design By Natasha Murakowska and Joseph Harman

Loved this post from A-Alpha: aalphabio.substack.com/p/building-a.... If anything I think the IPSAE (or any other post-hoc metric) picture is even worse than they show: after optimization the fraction of false positives would (probably) be even higher than in this dataset

09.01.2026 22:46 πŸ‘ 7 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0
Preview
In the Nipah binder competition, which protein designer will have proteins that bind? Nipah is one of the deadliest viruses in the world and considered one of the top future pandemic risks. We're hosting a protein design competition on Proteinbase where people can design binders agai...

TL;DR: this was a really fun exercise but now is probably a good time to bet against me on manifold.markets/Proteinbase/...

08.01.2026 15:51 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
~180 lines of code to win the in silico portion of the Adaptyv Nipah binding competition Here's the script we used to get 1st place in the in silico portion of the Adaptyv Nipah competition: import modal def download_boltz2(): from mosaic.models.boltz2 import Boltz2 Boltz2() ...

finally wrote up some notes on the adaptyv competition: blog.escalante.bio/180-lines-of....

08.01.2026 15:50 πŸ‘ 11 πŸ” 2 πŸ’¬ 1 πŸ“Œ 2

To speculate wildly though: the Boltz2 confidence module seems really, really easy to please even compared to a single AF2-multimer model. I wonder if this means hallucination is more likely to produce interfaces Boltz2 likes but AF2-SC (and likely physics πŸ˜…) does not.

20.12.2025 15:52 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

IMO it’s hard to draw conclusions from these data because each method has so many hyper-parameters. There isn't much work on AF3-gen hallucination; BindCraft is the result of some really careful and brilliant HPO. I was honestly surprised to get hits with Boltz2 for the work described in that post.

20.12.2025 15:42 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0