Alan Amin's Avatar

Alan Amin

@alannawzadamin

Faculty fellow at NYU working with @andrewgwils.bsky.social. Statistics & machine learning for proteins, RNA, DNA. Prev: @jura.bsky.social, PhD with Debora Marks Website: alannawzadamin.github.io

187
Followers
348
Following
22
Posts
23.11.2024
Joined
Posts Following

Latest posts by Alan Amin @alannawzadamin

Preview
A Unification of Discrete, Gaussian, and Simplicial Diffusion To model discrete sequences such as DNA, proteins, and language using diffusion, practitioners must choose between three major methods: diffusion in discrete space, Gaussian diffusion in Euclidean spa...

Excited about our new paper that unifies discrete, Gaussian, and simplicial diffusion, enabling model comparison, likelihood evaluation, stable training, and more, including a DNA design application! Amazing work from @alannawzadamin.bsky.social, Alina, Lily, and team! arxiv.org/abs/2512.15923

20.12.2025 21:23 πŸ‘ 26 πŸ” 3 πŸ’¬ 0 πŸ“Œ 2
Post image

Many thanks for the award for this work at AI4NA workshop at ICLR!

More experiments and details of our linear algebra in the paper! Come say hi at ICML! 7/7
Paper: arxiv.org/abs/2506.19598
Code: github.com/AlanNawzadAm...

25.06.2025 14:03 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Ablations on model size and number of features show that larger models trained on more features make more accurate predictions with no evidence of plateauing! This suggests further improvements by training across many phenotypes, or across populations in future work! 6/7

25.06.2025 14:03 πŸ‘ 0 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

We can now train flexible models on many features. DeepWAS models predict significant enrichment of effect in conserved regions, accessible chromatin, and TF binding sites! And, as shown above, they make better phenotype predictions in practice! 5/7

25.06.2025 14:03 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Our idea is to rearrange the linear algebra problem to counter-intuitively increase matrix size, but make it "better conditioned". Iterative algorithms (like CG) converge on huge matrices quickly! By moving to GPU we achieved another order of magnitude speed up! 4/7

25.06.2025 14:03 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

But to train on the full likelihood, we must solve big linear algebra problems because variants in the genome are correlated. A naive method would take O(MΒ³) – intractable for many variants! By reformulating the problem, we reduce this to roughly O(MΒ²), enabling large models. 3/7

25.06.2025 14:03 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Many previous methods used the computationally light β€œLDSR objective” and saw no benefit from larger models – maybe deep learning isn’t useful here? No! Using the full likelihood, DeepWAS unlocks the potential of deep priors, improving phenotype prediction on UK Biobank! 2/7

25.06.2025 14:03 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We can make population genetics studies more powerful by building priors of variant effect size from features like binding. But we’ve been stuck on linear models! We introduce DeepWAS to learn deep priors on millions of variants! #ICML2025 Andres Potapczynski, @andrewgwils.bsky.social 1/7

25.06.2025 14:03 πŸ‘ 7 πŸ” 3 πŸ’¬ 1 πŸ“Œ 1
Preview
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion Discrete diffusion models, like continuous diffusion models, generate high-quality samples by gradually undoing noise applied to datapoints with a Markov process. Gradual generation in theory comes wi...

Details and more results demonstrating scaling and on language in the paper, code for training classical and SCUD models on github! Great working with Nate Gruver and @andrewgwils.bsky.social! 7/7
arXiv: arxiv.org/abs/2506.08316
Code: github.com/AlanNawzadAm...

16.06.2025 14:20 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

When applying SCUD to gradual processes like Gaussian (images) and BLOSUM (proteins), we combine masking's scheduling advantage with domain-specific inductive biases, outperforming both masking and classical diffusion! 6/7

16.06.2025 14:20 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We show that controlling the amount of information about transition times in SCUD interpolates between uniform noise and masking, clearly illustrating why masking has superior performance. But SCUD also applies to other forward processes!

16.06.2025 14:20 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

That’s exactly what we do to build schedule-conditioned diffusion (SCUD) models! After some math, training a SCUD model is like training a classical model except time is replaced with the number of transitions at each position, a soft version of how β€œmasked” each position is! 4/7

16.06.2025 14:20 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

But in practice, SOTA diffusion models have detectable errors in transition times! The exception is masking, which is typically parameterized to bake-in the known distribution of β€œwhen”. Why don’t we represent this knowledge in other discrete diffusion models? 3/7

16.06.2025 14:20 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

In discrete space, the forward noising process involves jump transitions between states. Reversing these paths involves learning when and where to transition. Often the β€œwhen” is known in closed form a priori, so it should be easy to learn… 2/7

16.06.2025 14:20 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

There are many domain-specific noise processes for discrete diffusion, but masking dominates! Why? We show masking exploits a key property of discrete diffusion, which we use to unlock the potential of those structured processes and beat masking! w/ Nate Gruver and @andrewgwils.bsky.social 1/7

16.06.2025 14:20 πŸ‘ 5 πŸ” 1 πŸ’¬ 1 πŸ“Œ 1

Thrilled to announce that I am joining DTU in Copenhagen in the fall, as an assistant professor of chemistry.

My research group will focus on fundamental methodology in machine learning for molecules.

29.05.2025 17:04 πŸ‘ 58 πŸ” 11 πŸ’¬ 8 πŸ“Œ 3

Want to improve your protein or genomic language model’s performance at zero-shot variant effect prediction? We propose a simple adjustment to likelihood-based predicton

26.05.2025 17:33 πŸ‘ 24 πŸ” 6 πŸ’¬ 0 πŸ“Œ 0
Preview
Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences To build effective therapeutics, biologists iteratively mutate antibody sequences to improve binding and stability. Proposed mutations can be informed by previous measurements or by learning from larg...

Details and more results in the paper, trained models on github! Great working with the Big Hat team: Hunter Elliot, Ani Raghu, Calvin Mccarter, Peyton Greenside! (Also got outstanding poster at AIDrugX at Neurips!) 7/7
arXiv: arxiv.org/abs/2412.07763
Code: github.com/AlanNawzadAm...

17.12.2024 16:01 πŸ‘ 6 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Finally, we validated CloneBO in vitro! We did one round of designs and tested them in the lab, comparing against the next best method. We see that CloneBO’s designs improve stability and significantly beat LaMBO-Ab in binding. 6/7

17.12.2024 16:01 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

To use our prior to optimize an antibody, we now need to generate clonal families that match measurements in the lab – bad mutations should be unlikely and good mutations likely. We developed a twisted sequential Monte Carlo approach to efficiently sample from this posterior. 5/7

17.12.2024 16:01 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We train a transformer model to generate entire clonal families – CloneLM. Prompting with a single sequence, CloneLM samples realistic clonal families. These samples represent a prior on possible evolutionary trajectories in the immune system. 4/7

17.12.2024 16:01 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Our bodies make antibodies by evolving specific portions of their sequences to bind their target strongly and stably, resulting in a set of related sequences known as a clonal family. We leverage modern software and data to build a dataset of nearly a million clonal families! 3/7

17.12.2024 16:01 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image


SoTA methods search the space of sequences by iteratively suggesting mutations. But the space of antibodies is huge! CloneBO builds a prior on mutations that make strong and stable binders in our body to optimize antibodies in silico. 2/7

17.12.2024 16:01 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

How do you go from a hit in your antibody screen to a suitable drug? Now introducing CloneBO: we optimize antibodies in the lab by teaching a generative model how we optimize them in our bodies!
w/ Nat Gruver, Yilun Kuang, Lily Li, @andrewgwils.bsky.social and the team at Big Hat! 1/7

17.12.2024 16:01 πŸ‘ 11 πŸ” 1 πŸ’¬ 1 πŸ“Œ 1
Post image

New model trained on new dataset of nearly a million evolving antibody families at AIDrugX workshop Sunday at 4:20 pm (#76) #Neurips! Collab between @andrewgwils.bsky.social and BigHatBio. Stay tuned for full thread on how we used the model to optimize antibodies in the lab in coming days!

14.12.2024 16:32 πŸ‘ 4 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0