π‘Checkout more details below!
π Paper: arxiv.org/pdf/2603.00045
π Project Page & Code: codd-dllm.github.io
Huge thanks to my amazing collaborators and advisors who made this work possible: @zoeshao.bsky.social @benjiewang.bsky.social @yuqirose.bsky.social @guyvdb.bsky.social @anjiliu.bsky.social
04.03.2026 06:29
π 1
π 0
π¬ 0
π 0
β‘ While RL-based methods push reasoning performance but demand 150+ GPU hours to converge. CoDD achieves highly competitive gains at a fraction of that computational cost.
As a plug-and-play module trained on frozen backbone activations, it converges in just ~3 hours. π€―
04.03.2026 06:25
π 1
π 0
π¬ 1
π 0
πββοΈ At inference time, while adding considerably lower overhead compared to finetuning, CoDD is particularly vital at low compute budgets. At 64 steps, where standard methods frequently mode-collapse into repetition, CoDD sustains coherent reasoning:
04.03.2026 06:25
π 0
π 0
π¬ 1
π 0
Instead of forcing the Transformer backbone to build a joint distribution from scratch, we augment it with a tractable probabilistic inference layer (structured as a probabilistic circuit). The LLM handles the complex semantics, while the tractable layer handles the joint dependencies. π€
04.03.2026 06:25
π 2
π 0
π¬ 1
π 0
"He is from [MASK] [MASK]" β "San York"? dLLMs fail because they ignore token dependencies. This Factorization Barrier arises from a structural misspecification: models are restricted to fully factorized outputs. We break this barrier with CoDD, enabling coherent parallel generation. π
04.03.2026 06:25
π 18
π 5
π¬ 1
π 4