sometimes I wonder if Claude Code really does make me more productive. sure, it implemented this much faster than I could have, but I probably would have had the sense not to...
sometimes I wonder if Claude Code really does make me more productive. sure, it implemented this much faster than I could have, but I probably would have had the sense not to...
I've been testing this model a bit for design: github.com/escalante-bi... . Seems to work very well in general. For VHH could probably use higher PLM weight or something to better constrain the CDRs; for globular binders results look good.
every time I have to revisit the mmcif spec I want to cry: mmcif.wwpdb.org/dictionaries...
I wonder if an architecture where the entire model participated in diffusion (inc triangle layers) would work better for these applications. Obvious computational efficiency reasons not to do this, but it sometimes seems the trunk has completely made up its mind before diffusion...
Representative results for a single target
this is from a hyperparameter sweep with 10 benchmark targets using roughly this code: gist.github.com/nboyd/8e4f32.... I haven't actually run BindCraft; could be mosaic/protenix-specific. Also, these binders *are* different even if they have similar iptms; in vitro results might be worse
sad this .gif doesn't constitute reproducible scientific truth
if you don't like the huge extended helices or alpha solenoid proteins you're getting from hallucination-based protein design methods (bindcraft, mosaic, etc), increasing the scale of the initial sequence noise (typically Gumbel) increases funkiness without hurting final metrics like ipTM
Excellent, comprehensive rundown of the state of bio lab automation by @owlposting1.bsky.social
In retrospect, it's an important topic that has had almost zero discussion over the years!
A fun surprise to see some decade-old(!) work show up in there too.
www.owlposting.com/p/heuristics...
Naturally I had to add this to mosaic. Here's a VHH designed using `protenix_base_20250630_v1.0.0`. Example notebook here: github.com/escalante-bi...
Protenix v1.0 is out with some very impressive performance numbers (exceeding AF3 performance on protein-protein complexes)
Obviously these models aren't perfect and are trained on finite data, the data generating distribution doesn't really exist, there are better ways to control generative models, etc etc etc. This is still often a surprisingly illuminating way to think about these models.
Fun fact: non-generative models instead produce (in theory!) \argmin_x E [loss(x)], where the expectation is over p(x | c). This is why AF2 produces spaghetti and AF3 hallucinates helices.
Filtering criteria for PPIFlow
A more recent example is PPIFlow, which trains on complexes that are probably strong binders rather than all pairs of proteins in PDB (which might not bind tightly!). Or, the βubiquitin memorizationβ issue with BoltzGen: if you see a protein of length 76 in PDB, itβs almost certainly ubiquitin.
There are many examples of this in bio + ML: soluble proteinMPNN shifts the distribution of proteinMPNN by training only on soluble structures.
e.g., AF3 roughly answers the question βif you saw a protein with this sequence and MSA in PDB, what kinds of structures would it fold into?β
A corollary is you can control your generative modelβs output by filtering the training data.
If everything goes well, these models sample from p(x | c), where the joint distribution is the data generating distribution of the training set. i.e., they answer the question: βif you saw this conditioning information in your training set, what kinds of data would it be attached to?β
Love this theory for why antibody generative models produce sequences with nice developability properties: itβs a reflection of PDB. ayusuf.substack.com/p/developabi....
This is a very nice example of one way of thinking about conditional generative models.
Thanks Martin!!
Potential fork of PPIFlow with a more permissive license is very exciting: github.com/Mingchenchen...
Highly recommend their paper: results are super impressive and it's *not* an AF3 clone
Iβve got a physical copy π
JAX projects are more modular in my experience: it's sometimes really hard to get two torch projects to install in the same environment let alone interoperate nicely
probably I did too much functional programming + Julia in my formative years
speed/JIT/parallelization are really nice but itβs mostly a style thing for me. I find most large torch projects incomprehensible: lots of OO/imperative code/manual batching etc + frameworks like lightning/omegaconf. I can't go back to life before vmap & other higher-order functions.
inspired by @delalamo.xyz
vibe translated ligand/protein/soluble-mpnn from PyTorch to JAX. not sure if this works, but it was pretty fun and took 45 minutes of my time. Claude Code is going to make my virtuous no-torch lifestyle a lot easier... github.com/nboyd/jigand...
Loved this post from A-Alpha: aalphabio.substack.com/p/building-a.... If anything I think the IPSAE (or any other post-hoc metric) picture is even worse than they show: after optimization the fraction of false positives would (probably) be even higher than in this dataset
TL;DR: this was a really fun exercise but now is probably a good time to bet against me on manifold.markets/Proteinbase/...
finally wrote up some notes on the adaptyv competition: blog.escalante.bio/180-lines-of....
To speculate wildly though: the Boltz2 confidence module seems really, really easy to please even compared to a single AF2-multimer model. I wonder if this means hallucination is more likely to produce interfaces Boltz2 likes but AF2-SC (and likely physics π ) does not.
IMO itβs hard to draw conclusions from these data because each method has so many hyper-parameters. There isn't much work on AF3-gen hallucination; BindCraft is the result of some really careful and brilliant HPO. I was honestly surprised to get hits with Boltz2 for the work described in that post.