Every time you read an old paper you find out the author somewhere was like βthis was funnily enough inspired by a conversation with my lovely sweet little wife who also happens to have a PhD in the same topicβ
Every time you read an old paper you find out the author somewhere was like βthis was funnily enough inspired by a conversation with my lovely sweet little wife who also happens to have a PhD in the same topicβ
Excited to share the latest Olmo model: Olmo Hybrid. This is a model with gated delta net (GDN) layers in a 3:1 ratio with full attention. It follows lots of other developments like Qwen 3.5 and Kimi Linear.
Our freshly accepted CVPR 2026 paper is up on arxiv π. Project page, open code and more detailed post coming soon!
« CLIP is Shortsighted: Paying Attention Beyond the First Sentence ».
#CVPR
Lots of core team members of Alibaba Qwen are resigning publicly on X.
The gaping hole that Qwen imploding would leave in the open research ecosystem will be hard to fill. The small models are irreplaceable.
Iβll do my best to keep carrying that torch. Every bit matters.
Café Caron & frères is great too, also in little Italy. Closer to downtown, Pikolo has been consistently good for years. Word (The) is a nice little bookstore near McGill.
In Montreal my favourite cafe is Cafe Larue & fils, which is in little Italy. They donβt like credit cards and donβt have the coziest vibes but the coffee is so good! Btw check out Boulangerie Le Pain dans les Voiles which is on the same street if youβre there.
our paper on data mixing for LMs is out!
while building Olmo 3, we saw gaps between data mixing literature and real practice
π choosing proxy size, # runs, sampling, regression, constraints..
πdata shifts during LM dev: can we reuse past experiments?
Olmix tackles them all!
@karpathy.bsky.social 's microgpt.py
Train and inference GPT in 243 lines of pure, dependency-free Python.
gist.github.com/karpathy/862...
Without her work, the βintricately accurateβ navigation and timing of GPS would not have been possible. β€οΈ
Introducing Theorizer: Turning thousands of papers into scientific laws πβ‘οΈπ
Most automated discovery systems focus on experimentation. Theorizer tackles the other half of science: theory buildingβcompressing scattered findings into structured, testable claims. π§΅
I can report every time I had a good climbing session where I climbed a route I couldnβt before (no guarantee how often that happens)
Hello all! π
Iβm delighted to share a π¨ new preprint π¨:
βActive Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithmsβ.
A paper thread! π€©ππ§΅ 1/N
One of my favorite findings: Positional embeddings are just training wheels. They help convergence but hurt long-context generalization.
We found that if you simply delete them after pretraining and recalibrate for <1% of the original budget, you unlock massive context windows. Smarter, not harder.
"They showed the dead fish pictures of humans in social situations and "asked" the fish to determine the emotions of the people. When they ran their standard statistical software, the results showed "brain activity" in the fish that correlated with the emotions.
Please welcome Google's Open Source efforts to Blue Sky at @opensource.google!
Btw Howtown has been doing really solid videos on interesting topics and has become one of my favourite channels of all YouTube.
A screenshot from Rick Astley's Never Gonna Give you up music video
A screenshot from Rick Astley's Never Gonna Give you up music video
A screenshot from Rick Astley's Never Gonna Give you up music video
A photograph of the seminary gym in knives out. It has the same window.
Just watched the new Knives Out and I think it's really important you know that the scene in the Seminary's Gym is filmed in the same place Rick Astley filmed the music video for Never Gonna Give You Up.
I saw the window tracery and immediately made my friends pause the film so I could tell them.
OpenReview is a pillar of progress in the AI research community. Now it needs our support.
Along with several of my colleagues, I have pledged to help, and I encourage anyone who can to do the same.
openreview.net/donate
Omg wait. Someone literally posted this paper a couple weeks ago. Good job guys
Introducing Bolmo, a new family of byte-level language models built by "byteifying" our open Olmo 3βand to our knowledge, the first fully open byte-level LM to match or surpass SOTA subword models across a wide range of tasks. π§΅
Interesting choice, curious to see future interp papers from you
Olmo 3.1 32B Think shows that not just frontier labs can scale RL.
My favorite RL run yet over 7+ years of doing RL.
The biggest fully open RL run ever?
Gold stars on downstream evals is our original release, this latest one is the final checkpoint on the plot.
This post is kicking off some interesting discussion & self-reflection but a problem I see in the post and the replies is a conflation of two separate things: we want humans to do science because of the joy & meaning it gives them vs. humans must remain in control of the scientific process
It's happening! Canada launched two programs to recruit international researchers.
Canada Impact+ Research Chairs (1 million/yr for 8 yrs +)
Canada Impact+ Emerging Leaders.
I will do my best to facilitate the process for those interested. Hit me up.
www.canada.ca/en/impact-pl...
happy stylish but illegal monkey day to all who celebrate!!!!! by the way his name is darwin and he lives a farm now
Good researchers obsess over evals
The story of Olmo 3 (post-training), told through evals
NeurIPS Talk tomorrow.
Upper Level Room 2, 10:35AM.
Slides: docs.google.com/presentation...
We present Olmo 3, our next family of fully open, leading language models.
This family of 7B and 32B models represents:
1. The best 32B base model.
2. The best 7B Western thinking & instruct models.
3. The first 32B (or larger) fully open reasoning model.
Please report back how many survived this one
Today weβre releasing Deep Research Tulu (DR Tulu)βthe first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. π§π
COLM is going to San Francisco for 2026!
ποΈDates: October 6-9, 2026
π¨Venue: Hilton San Francisco Union Square
Website and CFPs for papers and workshops coming up soon!