BenjMurrell (@benjmurrell)

We're still newcomers in this space, and the literature is vast, so if you think we've missed something, or have any key citations to suggest, please let us know and we'll be happy to update the preprint!

21.11.2025 23:09 👍 1 🔁 0 💬 0 📌 0

Edit Flows: Flow Matching with Edit Operations Autoregressive generative models naturally generate variable-length sequences, while non-autoregressive models struggle, often imposing rigid, token-wise structures. We propose Edit Flows, a non-autor...

(We also show this for Edit Flows, which lies just outside of Generator Matching)

arxiv.org/abs/2506.09018

21.11.2025 23:09 👍 1 🔁 0 💬 1 📌 0

This means that if you're training a flow matching or diffusion model, you can happily rescale your loss with any (positive almost everywhere and integrable) scaling function, and the theoretical guarantees from Generator Matching will still hold.

21.11.2025 23:09 👍 0 🔁 0 💬 1 📌 0

Generator Matching: Generative modeling with arbitrary Markov processes We introduce Generator Matching, a modality-agnostic framework for generative modeling using arbitrary Markov processes. Generators characterize the infinitesimal evolution of a Markov process, which ...

For this, we closely follow the incredible Generator Matching framework, which subsumes nearly all flow matching and diffusion models.

arxiv.org/abs/2410.20587

21.11.2025 23:09 👍 0 🔁 0 💬 1 📌 0

Time dependent loss reweighting for flow matching and diffusion models is theoretically justified This brief note clarifies that, in Generator Matching (which subsumes a large family of flow matching and diffusion models over continuous, manifold, and discrete spaces), both the Bregman divergence ...

But when we looked, we couldn't find a sufficiently general justification.

So our preprint, driven by @lukasbillera.bsky.social with assists from @hedwignordlinder.bsky.social, formalizes this, and extends it a little in ways that are trickier to heuristically reason about:
arxiv.org/abs/2511.16599

21.11.2025 23:09 👍 1 🔁 1 💬 1 📌 0

We want things to be a bit less handwavey than this though, and we needed it for Branching Flows.

bsky.app/profile/benj...

21.11.2025 23:09 👍 0 🔁 0 💬 1 📌 0

Intuition suggests that it should be safe to re-weight your loss differently across time. E.g. If you trained a different model for each timestep, a time-dependent scaling would just be a constant for each model, and wouldn't change what is learned.

21.11.2025 23:09 👍 0 🔁 0 💬 1 📌 0

A technical thread on loss scaling in diffusion and flow matching models (related to a new preprint):

Since the dawn of time, people have been messing with (or dropping entirely) these pesky time-dependent loss scaling terms, mostly because the models train better without them.

21.11.2025 23:09 👍 3 🔁 1 💬 1 📌 0

We have antibody sequences (which you can read like text - maybe I'll pull out some sample trajectories and drop them in here) and we're getting started with text proper. Images are a less natural fit because of their grid structure!

10.11.2025 13:33 👍 1 🔁 0 💬 1 📌 0

Remind me to show you the twitter screenshot someone pinned on the wall in my office after my first Omicron neut thread 🤣

10.11.2025 12:55 👍 2 🔁 0 💬 0 📌 0

Ah @hedwignordlinder.bsky.social is on here too!

10.11.2025 09:34 👍 0 🔁 0 💬 0 📌 0

Oh and as usual, these visualizations (which are, to us, key to understanding these processes) were made by @antonoresten.bsky.social in @makie.org (@simi.bsky.social).

10.11.2025 09:09 👍 3 🔁 1 💬 0 📌 0

With my wonderful lab, who mostly aren't on here (except @lukasbillera.bsky.social and @antonoresten.bsky.social ?) we've been tinkering in this space since the end of the summer, but we think this is just too cool to sit on any longer.

The manuscript should be up by tomorrow and I'll drop a link.

10.11.2025 09:09 👍 4 🔁 1 💬 2 📌 0

GitHub - MurrellGroup/BranchingFlows.jl Contribute to MurrellGroup/BranchingFlows.jl development by creating an account on GitHub.

We've got a flexible implementation in Julia (github.com/MurrellGroup...) that uses our Flowfusion.jl package so you can compose families of base flows with the branching and deletion process. If there is community interest, we can set up a python implementation as well.

10.11.2025 09:09 👍 3 🔁 0 💬 1 📌 0

Eg. here is a protein example where the model designs two domains with an intervening linker (light blue chain in vid). A regular flow model would need to know, early on, exactly how many AAs are needed in the linker, but Branching Flows can decide on-the-fly and grow or shrink it.

10.11.2025 09:09 👍 1 🔁 0 💬 1 📌 0

We dislike ad hoc padding in discrete diffusion models, and the situation is even worse in continuous domains. Branching Flows removes this blemish. We also expect that this makes the trajectories easier to learn.

10.11.2025 09:09 👍 2 🔁 0 💬 1 📌 0

Given that all of the deletions, trees, anchors, etc are in Z, which doesn't interact with the theory, it is easy to manipulate the trajectories the model is learning.

Autocorrelated insertions? Change how you build the trees! Same for the "anchors" which control the process on internal branches.

10.11.2025 09:09 👍 1 🔁 0 💬 1 📌 0

You can see the branching pattern in the static plots "with trails", and if you stare at them you can see where lineages delete too.

10.11.2025 09:09 👍 1 🔁 0 💬 1 📌 0

The process can be seen clearly with a QM9-trained model (continuous atom positions, discrete atom types), starting from a single atom.

10.11.2025 09:09 👍 1 🔁 0 💬 1 📌 0

Side note: this is heavily inspired by the processes from phylogenetics (the other thing my lab works on).

10.11.2025 09:09 👍 2 🔁 0 💬 1 📌 0

The trick that makes it tractable to learn, for eg. continuous states, is that branching events are not generic insertions into space (which, we think, you need for TD jumps (@arnauddoucet.bsky.social ?). Instead, they duplicate the state that is branching.

10.11.2025 09:09 👍 2 🔁 0 💬 1 📌 0

Then on the internal nodes of the trees, we place "anchor" states (same space as the X1 elements). We put all that in Z.

Then, given this Z, Xt evolves over the trees, sampling when (but not which) branching and deletion events occur, all constructed to terminate at X1.

10.11.2025 09:09 👍 2 🔁 0 💬 1 📌 0

With Branching Flows, we draw X1 from the data. Then we draw X0 (one element? many elements? it all works). Then we add "to-be-deleted" nodes into X1. Then we draw a forest of trees, one per X0 element, that maps (one to many) the X0 elements to all the X1s (plus to-be-deleteds).

10.11.2025 09:09 👍 1 🔁 0 💬 1 📌 0

Typically Z will include X0 from an easy-to-sample distribution and X1 from the data distribution, and if you're feeling fancy you might couple them ( @alextong.bsky.social ). But what we learned while working on Branching Flows is that Z is a *playground*.

10.11.2025 09:09 👍 1 🔁 0 💬 1 📌 0

How does this work? First, you need to understand the amazing Generator Matching (arxiv.org/abs/2410.20587). In GM you first sample Z, and then construct a stochastic process Xt that, conditioned on Z, terminates at the data distribution.

10.11.2025 09:09 👍 1 🔁 0 💬 1 📌 0

Another infix, this time a nanobody CDR3.

10.11.2025 09:09 👍 0 🔁 0 💬 1 📌 0

And this is "infix sampling", but where you let the model figure out how many amino acids are needed to span the gaps.

10.11.2025 09:09 👍 3 🔁 0 💬 1 📌 0

Another binder, which threads a tail down a groove.

10.11.2025 09:09 👍 1 🔁 0 💬 1 📌 0

For example, here is what "binder design" looks like when you start from a single amino acid.

10.11.2025 09:09 👍 1 🔁 0 💬 1 📌 0

This is our "graphical abstract":

10.11.2025 09:09 👍 3 🔁 0 💬 1 📌 0

BenjMurrell

Latest posts by BenjMurrell @benjmurrell