Bruce (Zhi) Wen's Avatar

Bruce (Zhi) Wen

@zhi-bruce-wen

Senior applied research scientist at Mila. NLP. ML for healthcare/bio. Football. Pink Floyd. Post-rock. Montreal bagel ambassador πŸ‡¨πŸ‡¦. https://zhi-wen.net/

258
Followers
162
Following
60
Posts
06.11.2024
Joined
Posts Following

Latest posts by Bruce (Zhi) Wen @zhi-bruce-wen

Every time you read an old paper you find out the author somewhere was like β€œthis was funnily enough inspired by a conversation with my lovely sweet little wife who also happens to have a PhD in the same topic”

13.03.2026 13:06 πŸ‘ 10 πŸ” 4 πŸ’¬ 1 πŸ“Œ 1
Post image

Excited to share the latest Olmo model: Olmo Hybrid. This is a model with gated delta net (GDN) layers in a 3:1 ratio with full attention. It follows lots of other developments like Qwen 3.5 and Kimi Linear.

05.03.2026 16:26 πŸ‘ 67 πŸ” 8 πŸ’¬ 6 πŸ“Œ 4
Preview
CLIP Is Shortsighted: Paying Attention Beyond the First Sentence CLIP models learn transferable multi-modal features via image-text contrastive learning on internet-scale data. They are widely used in zero-shot classification, multi-modal retrieval, text-to-image d...

Our freshly accepted CVPR 2026 paper is up on arxiv 😍. Project page, open code and more detailed post coming soon!

« CLIP is Shortsighted: Paying Attention Beyond the First Sentence ».

#CVPR

27.02.2026 03:40 πŸ‘ 5 πŸ” 2 πŸ’¬ 0 πŸ“Œ 1

Lots of core team members of Alibaba Qwen are resigning publicly on X.

The gaping hole that Qwen imploding would leave in the open research ecosystem will be hard to fill. The small models are irreplaceable.

I’ll do my best to keep carrying that torch. Every bit matters.

03.03.2026 18:10 πŸ‘ 105 πŸ” 11 πŸ’¬ 3 πŸ“Œ 2

Café Caron & frères is great too, also in little Italy. Closer to downtown, Pikolo has been consistently good for years. Word (The) is a nice little bookstore near McGill.

15.02.2026 03:02 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

In Montreal my favourite cafe is Cafe Larue & fils, which is in little Italy. They don’t like credit cards and don’t have the coziest vibes but the coffee is so good! Btw check out Boulangerie Le Pain dans les Voiles which is on the same street if you’re there.

15.02.2026 03:02 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

our paper on data mixing for LMs is out!

while building Olmo 3, we saw gaps between data mixing literature and real practice

🐠choosing proxy size, # runs, sampling, regression, constraints..
🐟data shifts during LM dev: can we reuse past experiments?

Olmix tackles them all!

13.02.2026 17:30 πŸ‘ 29 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0
Preview
microgpt microgpt. GitHub Gist: instantly share code, notes, and snippets.

@karpathy.bsky.social 's microgpt.py

Train and inference GPT in 243 lines of pure, dependency-free Python.

gist.github.com/karpathy/862...

11.02.2026 23:56 πŸ‘ 86 πŸ” 15 πŸ’¬ 1 πŸ“Œ 3

Without her work, the β€œintricately accurate” navigation and timing of GPS would not have been possible. ❀️

02.02.2026 17:55 πŸ‘ 52 πŸ” 21 πŸ’¬ 3 πŸ“Œ 0
Post image

Introducing Theorizer: Turning thousands of papers into scientific laws πŸ“šβž‘οΈπŸ“œ

Most automated discovery systems focus on experimentation. Theorizer tackles the other half of science: theory buildingβ€”compressing scattered findings into structured, testable claims. 🧡

28.01.2026 18:37 πŸ‘ 33 πŸ” 8 πŸ’¬ 1 πŸ“Œ 5

I can report every time I had a good climbing session where I climbed a route I couldn’t before (no guarantee how often that happens)

17.01.2026 05:36 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Hello all! πŸ‘‹

I’m delighted to share a 🚨 new preprint 🚨:

β€œActive Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithms”.

A paper thread! πŸ€©πŸ“„πŸ§΅ 1/N

15.01.2026 12:49 πŸ‘ 47 πŸ” 11 πŸ’¬ 2 πŸ“Œ 2

One of my favorite findings: Positional embeddings are just training wheels. They help convergence but hurt long-context generalization.

We found that if you simply delete them after pretraining and recalibrate for <1% of the original budget, you unlock massive context windows. Smarter, not harder.

12.01.2026 04:12 πŸ‘ 220 πŸ” 32 πŸ’¬ 8 πŸ“Œ 1
Post image

"They showed the dead fish pictures of humans in social situations and "asked" the fish to determine the emotions of the people. When they ran their standard statistical software, the results showed "brain activity" in the fish that correlated with the emotions.

09.01.2026 13:15 πŸ‘ 31 πŸ” 7 πŸ’¬ 1 πŸ“Œ 2

Please welcome Google's Open Source efforts to Blue Sky at @opensource.google!

07.01.2026 21:12 πŸ‘ 247 πŸ” 38 πŸ’¬ 7 πŸ“Œ 4

Btw Howtown has been doing really solid videos on interesting topics and has become one of my favourite channels of all YouTube.

31.12.2025 04:10 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
A screenshot from Rick Astley's Never Gonna Give you up music video

A screenshot from Rick Astley's Never Gonna Give you up music video

A screenshot from Rick Astley's Never Gonna Give you up music video

A screenshot from Rick Astley's Never Gonna Give you up music video

A screenshot from Rick Astley's Never Gonna Give you up music video

A screenshot from Rick Astley's Never Gonna Give you up music video

A photograph of the seminary gym in knives out. It has the same window.

A photograph of the seminary gym in knives out. It has the same window.

Just watched the new Knives Out and I think it's really important you know that the scene in the Seminary's Gym is filmed in the same place Rick Astley filmed the music video for Never Gonna Give You Up.

I saw the window tracery and immediately made my friends pause the film so I could tell them.

29.12.2025 13:11 πŸ‘ 29043 πŸ” 7350 πŸ’¬ 430 πŸ“Œ 529
OpenReview Promoting openness in scientific communication and the peer-review process

OpenReview is a pillar of progress in the AI research community. Now it needs our support.

Along with several of my colleagues, I have pledged to help, and I encourage anyone who can to do the same.

openreview.net/donate

19.12.2025 19:58 πŸ‘ 27 πŸ” 6 πŸ’¬ 1 πŸ“Œ 1
Preview
Sparse Autoencoders are Topic Models Sparse autoencoders (SAEs) are used to analyze embeddings, but their role and practical value are debated. We propose a new perspective on SAEs by demonstrating that they can be naturally understood a...

Omg wait. Someone literally posted this paper a couple weeks ago. Good job guys

15.12.2025 23:00 πŸ‘ 19 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

Introducing Bolmo, a new family of byte-level language models built by "byteifying" our open Olmo 3β€”and to our knowledge, the first fully open byte-level LM to match or surpass SOTA subword models across a wide range of tasks. 🧡

15.12.2025 17:19 πŸ‘ 75 πŸ” 15 πŸ’¬ 1 πŸ“Œ 4

Interesting choice, curious to see future interp papers from you

14.12.2025 13:19 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image

Olmo 3.1 32B Think shows that not just frontier labs can scale RL.
My favorite RL run yet over 7+ years of doing RL.
The biggest fully open RL run ever?

Gold stars on downstream evals is our original release, this latest one is the final checkpoint on the plot.

12.12.2025 17:15 πŸ‘ 24 πŸ” 4 πŸ’¬ 1 πŸ“Œ 1

This post is kicking off some interesting discussion & self-reflection but a problem I see in the post and the replies is a conflation of two separate things: we want humans to do science because of the joy & meaning it gives them vs. humans must remain in control of the scientific process

10.12.2025 14:32 πŸ‘ 29 πŸ” 2 πŸ’¬ 4 πŸ“Œ 1
The Government of Canada introduces new programs for international researchers - Canada.ca

It's happening! Canada launched two programs to recruit international researchers.

Canada Impact+ Research Chairs (1 million/yr for 8 yrs +)
Canada Impact+ Emerging Leaders.

I will do my best to facilitate the process for those interested. Hit me up.

www.canada.ca/en/impact-pl...

09.12.2025 18:06 πŸ‘ 199 πŸ” 115 πŸ’¬ 4 πŸ“Œ 20
Post image Post image

happy stylish but illegal monkey day to all who celebrate!!!!! by the way his name is darwin and he lives a farm now

09.12.2025 18:29 πŸ‘ 2035 πŸ” 691 πŸ’¬ 21 πŸ“Œ 54
Post image

Good researchers obsess over evals
The story of Olmo 3 (post-training), told through evals
NeurIPS Talk tomorrow.
Upper Level Room 2, 10:35AM.
Slides: docs.google.com/presentation...

06.12.2025 20:35 πŸ‘ 30 πŸ” 6 πŸ’¬ 1 πŸ“Œ 0
Post image Post image Post image

We present Olmo 3, our next family of fully open, leading language models.
This family of 7B and 32B models represents:

1. The best 32B base model.
2. The best 7B Western thinking & instruct models.
3. The first 32B (or larger) fully open reasoning model.

20.11.2025 14:32 πŸ‘ 107 πŸ” 24 πŸ’¬ 3 πŸ“Œ 3

Please report back how many survived this one

20.11.2025 12:12 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

Today we’re releasing Deep Research Tulu (DR Tulu)β€”the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. πŸ§­πŸ“š

18.11.2025 15:31 πŸ‘ 48 πŸ” 14 πŸ’¬ 1 πŸ“Œ 3
Post image

COLM is going to San Francisco for 2026!

πŸ—“οΈDates: October 6-9, 2026
🏨Venue: Hilton San Francisco Union Square

Website and CFPs for papers and workshops coming up soon!

11.11.2025 19:30 πŸ‘ 21 πŸ” 6 πŸ’¬ 0 πŸ“Œ 1