Bruce (Zhi) Wen (@zhi-bruce-wen)

Every time you read an old paper you find out the author somewhere was like “this was funnily enough inspired by a conversation with my lovely sweet little wife who also happens to have a PhD in the same topic”

13.03.2026 13:06 👍 10 🔁 4 💬 1 📌 1

Excited to share the latest Olmo model: Olmo Hybrid. This is a model with gated delta net (GDN) layers in a 3:1 ratio with full attention. It follows lots of other developments like Qwen 3.5 and Kimi Linear.

05.03.2026 16:26 👍 67 🔁 8 💬 6 📌 4

CLIP Is Shortsighted: Paying Attention Beyond the First Sentence CLIP models learn transferable multi-modal features via image-text contrastive learning on internet-scale data. They are widely used in zero-shot classification, multi-modal retrieval, text-to-image d...

Our freshly accepted CVPR 2026 paper is up on arxiv 😍. Project page, open code and more detailed post coming soon!

« CLIP is Shortsighted: Paying Attention Beyond the First Sentence ».

#CVPR

27.02.2026 03:40 👍 5 🔁 2 💬 0 📌 1

Lots of core team members of Alibaba Qwen are resigning publicly on X.

The gaping hole that Qwen imploding would leave in the open research ecosystem will be hard to fill. The small models are irreplaceable.

I’ll do my best to keep carrying that torch. Every bit matters.

03.03.2026 18:10 👍 105 🔁 11 💬 3 📌 2

Café Caron & frères is great too, also in little Italy. Closer to downtown, Pikolo has been consistently good for years. Word (The) is a nice little bookstore near McGill.

15.02.2026 03:02 👍 1 🔁 0 💬 1 📌 0

In Montreal my favourite cafe is Cafe Larue & fils, which is in little Italy. They don’t like credit cards and don’t have the coziest vibes but the coffee is so good! Btw check out Boulangerie Le Pain dans les Voiles which is on the same street if you’re there.

15.02.2026 03:02 👍 2 🔁 0 💬 1 📌 0

our paper on data mixing for LMs is out!

while building Olmo 3, we saw gaps between data mixing literature and real practice

🐠choosing proxy size, # runs, sampling, regression, constraints..
🐟data shifts during LM dev: can we reuse past experiments?

Olmix tackles them all!

13.02.2026 17:30 👍 29 🔁 4 💬 1 📌 0

microgpt microgpt. GitHub Gist: instantly share code, notes, and snippets.

@karpathy.bsky.social 's microgpt.py

Train and inference GPT in 243 lines of pure, dependency-free Python.

gist.github.com/karpathy/862...

11.02.2026 23:56 👍 86 🔁 15 💬 1 📌 3

Without her work, the “intricately accurate” navigation and timing of GPS would not have been possible. ❤️

02.02.2026 17:55 👍 52 🔁 21 💬 3 📌 0

Introducing Theorizer: Turning thousands of papers into scientific laws 📚➡️📜

Most automated discovery systems focus on experimentation. Theorizer tackles the other half of science: theory building—compressing scattered findings into structured, testable claims. 🧵

28.01.2026 18:37 👍 33 🔁 8 💬 1 📌 5

I can report every time I had a good climbing session where I climbed a route I couldn’t before (no guarantee how often that happens)

17.01.2026 05:36 👍 2 🔁 0 💬 0 📌 0

Hello all! 👋

I’m delighted to share a 🚨 new preprint 🚨:

“Active Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithms”.

A paper thread! 🤩📄🧵 1/N

15.01.2026 12:49 👍 47 🔁 11 💬 2 📌 2

One of my favorite findings: Positional embeddings are just training wheels. They help convergence but hurt long-context generalization.

We found that if you simply delete them after pretraining and recalibrate for <1% of the original budget, you unlock massive context windows. Smarter, not harder.

12.01.2026 04:12 👍 220 🔁 32 💬 8 📌 1

"They showed the dead fish pictures of humans in social situations and "asked" the fish to determine the emotions of the people. When they ran their standard statistical software, the results showed "brain activity" in the fish that correlated with the emotions.

09.01.2026 13:15 👍 31 🔁 7 💬 1 📌 2

Please welcome Google's Open Source efforts to Blue Sky at @opensource.google!

07.01.2026 21:12 👍 247 🔁 38 💬 7 📌 4

Btw Howtown has been doing really solid videos on interesting topics and has become one of my favourite channels of all YouTube.

31.12.2025 04:10 👍 0 🔁 0 💬 0 📌 0

A screenshot from Rick Astley's Never Gonna Give you up music video

A photograph of the seminary gym in knives out. It has the same window.

Just watched the new Knives Out and I think it's really important you know that the scene in the Seminary's Gym is filmed in the same place Rick Astley filmed the music video for Never Gonna Give You Up.

I saw the window tracery and immediately made my friends pause the film so I could tell them.

29.12.2025 13:11 👍 29043 🔁 7350 💬 430 📌 529

OpenReview Promoting openness in scientific communication and the peer-review process

OpenReview is a pillar of progress in the AI research community. Now it needs our support.

Along with several of my colleagues, I have pledged to help, and I encourage anyone who can to do the same.

openreview.net/donate

19.12.2025 19:58 👍 27 🔁 6 💬 1 📌 1

Sparse Autoencoders are Topic Models Sparse autoencoders (SAEs) are used to analyze embeddings, but their role and practical value are debated. We propose a new perspective on SAEs by demonstrating that they can be naturally understood a...

Omg wait. Someone literally posted this paper a couple weeks ago. Good job guys

15.12.2025 23:00 👍 19 🔁 2 💬 1 📌 0

Introducing Bolmo, a new family of byte-level language models built by "byteifying" our open Olmo 3—and to our knowledge, the first fully open byte-level LM to match or surpass SOTA subword models across a wide range of tasks. 🧵

15.12.2025 17:19 👍 75 🔁 15 💬 1 📌 4

Interesting choice, curious to see future interp papers from you

14.12.2025 13:19 👍 1 🔁 0 💬 0 📌 0

Olmo 3.1 32B Think shows that not just frontier labs can scale RL.
My favorite RL run yet over 7+ years of doing RL.
The biggest fully open RL run ever?

Gold stars on downstream evals is our original release, this latest one is the final checkpoint on the plot.

12.12.2025 17:15 👍 24 🔁 4 💬 1 📌 1

This post is kicking off some interesting discussion & self-reflection but a problem I see in the post and the replies is a conflation of two separate things: we want humans to do science because of the joy & meaning it gives them vs. humans must remain in control of the scientific process

10.12.2025 14:32 👍 29 🔁 2 💬 4 📌 1

The Government of Canada introduces new programs for international researchers - Canada.ca

It's happening! Canada launched two programs to recruit international researchers.

Canada Impact+ Research Chairs (1 million/yr for 8 yrs +)
Canada Impact+ Emerging Leaders.

I will do my best to facilitate the process for those interested. Hit me up.

www.canada.ca/en/impact-pl...

09.12.2025 18:06 👍 199 🔁 115 💬 4 📌 20

happy stylish but illegal monkey day to all who celebrate!!!!! by the way his name is darwin and he lives a farm now

09.12.2025 18:29 👍 2035 🔁 691 💬 21 📌 54

Good researchers obsess over evals
The story of Olmo 3 (post-training), told through evals
NeurIPS Talk tomorrow.
Upper Level Room 2, 10:35AM.
Slides: docs.google.com/presentation...

06.12.2025 20:35 👍 30 🔁 6 💬 1 📌 0

We present Olmo 3, our next family of fully open, leading language models.
This family of 7B and 32B models represents:

1. The best 32B base model.
2. The best 7B Western thinking & instruct models.
3. The first 32B (or larger) fully open reasoning model.

20.11.2025 14:32 👍 107 🔁 24 💬 3 📌 3

Please report back how many survived this one

20.11.2025 12:12 👍 1 🔁 0 💬 0 📌 0

Today we’re releasing Deep Research Tulu (DR Tulu)—the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. 🧭📚

18.11.2025 15:31 👍 48 🔁 14 💬 1 📌 3

COLM is going to San Francisco for 2026!

🗓️Dates: October 6-9, 2026
🏨Venue: Hilton San Francisco Union Square

Website and CFPs for papers and workshops coming up soon!

11.11.2025 19:30 👍 21 🔁 6 💬 0 📌 1

Bruce (Zhi) Wen

Latest posts by Bruce (Zhi) Wen @zhi-bruce-wen