Huge thanks to my amazing coauthors ๐ Jiayi Wu @taylor-sorensen.bsky.social Jiaxin Pei @mbakker.bsky.social !
Excited to keep pushing on pluralistic alignment. Please reach out if you want to connect ๐ฌ๐ค
Paper: arxiv.org/abs/2512.01351
Website: overtonbench.github.io
9/9
10.03.2026 17:43
๐ 0
๐ 0
๐ฌ 0
๐ 0
Inspired by
@bennokrojer.bsky.social, we included a Behind the Scenes section ๐ฌ
The goal is to make science more transparent ๐, share lessons learned ๐ง , and provide a more realistic lens on the research journey ๐ฃ
8/
bsky.app/profile/benn...
10.03.2026 17:43
๐ 4
๐ 1
๐ฌ 1
๐ 0
However, human studies aren't scalable๐ฐ
We build + validate an LLM-as-judge that approximates human representation scores so you can use ๐๐๐๐๐๐๐๐๐๐๐๐ without running a new study each time
We open-source our code to foster development of more pluralistic LLMs ๐
7/
10.03.2026 17:43
๐ 0
๐ 0
๐ฌ 1
๐ 0
A key finding: neutral โ pluralistic
A politically balanced or neutral response can still fail to represent large swaths of viewpoints
We find political slant and pluralism are ๐ฃ๐๐๐๐ฉ๐๐ซ๐๐ก๐ฎ ๐๐ค๐ง๐ง๐๐ก๐๐ฉ๐๐ and ๐๐๐จ๐ฉ๐๐ฃ๐๐ฉ concepts
6/
10.03.2026 17:43
๐ 2
๐ 1
๐ฌ 1
๐ 0
So how do current models do? ๐
Best-performing models score 0.35โ0.41 well below 1 (max)
A lot of room to grow โ and we discuss in the paper interesting variation across models and topics, pointing to where alignment efforts should focus
5/
10.03.2026 17:43
๐ 0
๐ 0
๐ฌ 1
๐ 0
Polis
To determine ๐๐๐จ๐ฉ๐๐ฃ๐๐ฉ viewpoints, we ran a 1,200+ person US-representative human study ๐งโ๐คโ๐งand cluster
๐กKey: instead of algorithmic clustering, users vote to group themselves, inspired by pol.is + is more faithful to the underlying perspectives
4/
10.03.2026 17:43
๐ 0
๐ 0
๐ฌ 1
๐ 0
To operationalize, we introduce a set-coverage metric
For each question, we calculate the proportion of ๐๐๐จ๐ฉ๐๐ฃ๐๐ฉ viewpoints ๐ฃ๏ธ covered by each model response.
We determine coverage by directly asking humans whether their POV is represented in the model response
3/
10.03.2026 17:43
๐ 0
๐ 0
๐ฌ 1
๐ 0
๐๐๐๐๐๐๐๐๐๐๐๐ measures Overton pluralism:
For a subjective query, to what extent does a model's response represent the โจfullโจ range of reasonable viewpoints?
2/
10.03.2026 17:43
๐ 0
๐ 0
๐ฌ 1
๐ 0
There's been a lot of excitement about pluralistic value alignment ๐ โ AI that reflects the full range of human perspectives
But no formal way to benchmark whether we're actually making progress. ๐ค
Introducing ๐๐๐๐๐๐๐๐๐๐๐๐. ๐Accepted to #ICLR2026
1/n ๐งต
10.03.2026 17:43
๐ 15
๐ 1
๐ฌ 1
๐ 1
Do LLMs Benefit from Their Own Words?๐ค
In multi-turn chats, models are typically given their own past responses as context.
But do their own words always helpโฆ
Or are they more often a waste of compute and a distraction?
๐งต
arxiv.org/abs/2602.24287
09.03.2026 14:13
๐ 37
๐ 4
๐ฌ 2
๐ 2
Title, author list, and two figures from the paper.
Title: The Aftermath of DrawEduMath: Vision Language Models
Underperform with Struggling Students and Misdiagnose Errors
Authors: Li Lucy, Albert Zhang, Nathan Anderson, Ryan Knight, Kyle Lo
Figure 1: On the left is a math problem, where students are asked to draw x < 5/2 on a number line. The right side shows two example student responses that differ in correctness. DrawEduMath pairs each math problem with one student response, and prompts VLMs to answer questions about the student response.
Figure 2: VLMs consistently perform worse on answering DrawEduMath benchmark questions pertaining to erroneous student responses. Performance on non-erroneous student responses is labeled with specific VLMsโ names; that same modelโs performance on erroneous student responses is directly below.
Models are now expert math solvers, and so AI for math education is receiving increasing attention.
Our new preprint evaluates 11 VLMs on our QA benchmark, DrawEduMath. We highlight a startling gap: models perform less well on inputs from K-12 students who need more help. ๐งต
03.03.2026 03:08
๐ 34
๐ 12
๐ฌ 4
๐ 2
Yesterday was my last day at MSR. We recently learned that our roles were eliminated, and with them our little FATE Montreal team.
I joined MSR a bit over 7.5 years ago while on active chemotherapy, and being at MSR has overlapped with so much change in my life.
03.03.2026 19:49
๐ 33
๐ 6
๐ฌ 4
๐ 0
Our paper, "What's in My Human Feedback", received an oral presentation at ICLR!
Our method automatically+interpretably identifies preferences in human feedback data; we use this to improve personalization + safety.
Reach out if you have data/use cases to apply this to!
arxiv.org/pdf/2510.26202
26.02.2026 19:27
๐ 27
๐ 3
๐ฌ 0
๐ 0
Finally we do test it empirically: finding some models where the embedding matrix of the LLM already provides decently interpretable nearest neighbors
But this was not the full story yet...
@mariusmosbach.bsky.social and @elinorpd.bsky.social nudged me to use contextual embeddings
11.02.2026 15:10
๐ 1
๐ 1
๐ฌ 1
๐ 0
Really cool new work with surprising results! Highly recommend checking out the demo ๐
11.02.2026 15:20
๐ 3
๐ 0
๐ฌ 0
๐ 0
Grok fact-checks our paper on Grok fact-checking - and it approves!
04.02.2026 13:49
๐ 28
๐ 7
๐ฌ 1
๐ 0
๐ญ How do LLMs (mis)represent culture?
๐งฎ How often?
๐ง Misrepresentations = missing knowledge? spoiler: NO!
At #CHI2026 we are bringing โจTALESโจ a participatory evaluation of cultural (mis)reps & knowledge in multilingual LLM-stories for India
๐ arxiv.org/abs/2511.21322
1/10
02.02.2026 21:38
๐ 45
๐ 22
๐ฌ 1
๐ 2
this is amazing! made quick NYC & boston posters
30.01.2026 21:05
๐ 3
๐ 0
๐ฌ 0
๐ 0
Potato is a great platform for researchers! Highly recommend (plus a great development team behind it)
30.01.2026 15:41
๐ 1
๐ 0
๐ฌ 0
๐ 0
Microsoft Research NYC is hiringย a researcher in the space of AI and society!
29.01.2026 23:27
๐ 62
๐ 40
๐ฌ 2
๐ 2
Iโve had a similar experience except with knitting / crocheting!
29.01.2026 18:21
๐ 2
๐ 0
๐ฌ 0
๐ 0
Federal agents with weapons drawn, moments before murdering American citizens on the streets of Minneapolis at the dawn of 2026.
What should academics be doing right now?
I have been writing up some thoughts on what the research says about effective action, and what universities specifically can do.
davidbau.github.io/poetsandnurs...
It's on GitHub. Suggestions and pull requests welcome.
github.com/davidbau/poe...
26.01.2026 03:27
๐ 37
๐ 16
๐ฌ 0
๐ 4
Whoa! Thatโs a nice view! Orโฆ well, Iโm sure itโs nice on a clear day
26.01.2026 05:58
๐ 1
๐ 0
๐ฌ 0
๐ 0
I'll be presenting this work on January 25th (Hall 2, poster 41) at #AAAI2026 in Singapore!
Please stop by and reach out if you'd like to chat ๐
23.01.2026 14:49
๐ 3
๐ 0
๐ฌ 0
๐ 0
๐https://arxiv.org/abs/2406.17737
Work done with Deb Roy and Jad Kabbara
@jad-kabbara.bsky.social
at @mit.edu @medialab.bsky.social
23.01.2026 14:42
๐ 0
๐ 0
๐ฌ 0
๐ 0
This pattern, which we refer to as targeted underperformance, suggests that models systematically lower information quality for some users.
As LLMs increasingly mediate access to knowledge ๐๐ง , these dynamics risk amplifying epistemic inequity at scale.
6/6
23.01.2026 14:42
๐ 0
๐ 0
๐ฌ 1
๐ 0
Hereโs one concrete example:
The same factual SciQ question posed to Claude
โ
Answered for a control user (no bio)
โ Refused for a less-educated Russian user
5/6
23.01.2026 14:42
๐ 0
๐ 0
๐ฌ 1
๐ 0
Across models, we observe systematic drops in accuracy and truthfulness for users who are:
โข Less educated
โข Non-native English speakers
โข From outside the U.S.
These effects compound and are largely invisible ๐ to standard evaluations.
4/6
23.01.2026 14:42
๐ 0
๐ 0
๐ฌ 1
๐ 1
We evaluated GPT-4, Claude Opus, and Llama-3-8B in a Multiple Choice setup with questions taken from TruthfulQA and SciQ. Each question is conditioned on a user bio where we vary three user traits:
โข Education level ๐
โข Country of origin ๐
โข English proficiency ๐ฃ๏ธ
3/6
23.01.2026 14:42
๐ 0
๐ 0
๐ฌ 1
๐ 0
Spoiler alert: we find the answer is often no! โ ๏ธ
LLM accuracy and truthfulness systematically degrade for some users in ways that standard benchmarks, focused on best-case performance, fail to capture.
2/6
23.01.2026 14:42
๐ 0
๐ 0
๐ฌ 1
๐ 0