Elinor (@elinorpd)

Huge thanks to my amazing coauthors 🙏 Jiayi Wu @taylor-sorensen.bsky.social Jiaxin Pei @mbakker.bsky.social !

Excited to keep pushing on pluralistic alignment. Please reach out if you want to connect 💬🤗

Paper: arxiv.org/abs/2512.01351
Website: overtonbench.github.io

9/9

10.03.2026 17:43 👍 0 🔁 0 💬 0 📌 0

Inspired by
@bennokrojer.bsky.social, we included a Behind the Scenes section 🎬

The goal is to make science more transparent 🔍, share lessons learned 🧠, and provide a more realistic lens on the research journey 👣

8/

bsky.app/profile/benn...

10.03.2026 17:43 👍 4 🔁 1 💬 1 📌 0

However, human studies aren't scalable💰

We build + validate an LLM-as-judge that approximates human representation scores so you can use 𝐎𝐕𝐄𝐑𝐓𝐎𝐍𝐁𝐄𝐍𝐂𝐇 without running a new study each time

We open-source our code to foster development of more pluralistic LLMs 🚀

7/

10.03.2026 17:43 👍 0 🔁 0 💬 1 📌 0

A key finding: neutral ≠ pluralistic

A politically balanced or neutral response can still fail to represent large swaths of viewpoints

We find political slant and pluralism are 𝙣𝙚𝙜𝙖𝙩𝙞𝙫𝙚𝙡𝙮 𝙘𝙤𝙧𝙧𝙚𝙡𝙖𝙩𝙚𝙙 and 𝙙𝙞𝙨𝙩𝙞𝙣𝙘𝙩 concepts

6/

10.03.2026 17:43 👍 2 🔁 1 💬 1 📌 0

So how do current models do? 👀

Best-performing models score 0.35–0.41 well below 1 (max)

A lot of room to grow — and we discuss in the paper interesting variation across models and topics, pointing to where alignment efforts should focus

5/

10.03.2026 17:43 👍 0 🔁 0 💬 1 📌 0

Polis

To determine 𝙙𝙞𝙨𝙩𝙞𝙣𝙘𝙩 viewpoints, we ran a 1,200+ person US-representative human study 🧑‍🤝‍🧑and cluster

💡Key: instead of algorithmic clustering, users vote to group themselves, inspired by pol.is + is more faithful to the underlying perspectives

4/

10.03.2026 17:43 👍 0 🔁 0 💬 1 📌 0

To operationalize, we introduce a set-coverage metric

For each question, we calculate the proportion of 𝙙𝙞𝙨𝙩𝙞𝙣𝙘𝙩 viewpoints 🗣️ covered by each model response.

We determine coverage by directly asking humans whether their POV is represented in the model response

3/

10.03.2026 17:43 👍 0 🔁 0 💬 1 📌 0

𝐎𝐕𝐄𝐑𝐓𝐎𝐍𝐁𝐄𝐍𝐂𝐇 measures Overton pluralism:

For a subjective query, to what extent does a model's response represent the ✨full✨ range of reasonable viewpoints?

2/

10.03.2026 17:43 👍 0 🔁 0 💬 1 📌 0

There's been a lot of excitement about pluralistic value alignment 🌈 — AI that reflects the full range of human perspectives

But no formal way to benchmark whether we're actually making progress. 🤔

Introducing 𝐎𝐕𝐄𝐑𝐓𝐎𝐍𝐁𝐄𝐍𝐂𝐇. 🎉Accepted to #ICLR2026

1/n 🧵

10.03.2026 17:43 👍 15 🔁 1 💬 1 📌 1

Do LLMs Benefit from Their Own Words?🤔

In multi-turn chats, models are typically given their own past responses as context.
But do their own words always help…
Or are they more often a waste of compute and a distraction?
🧵
arxiv.org/abs/2602.24287

09.03.2026 14:13 👍 37 🔁 4 💬 2 📌 2

Title, author list, and two figures from the paper. Title: The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors Authors: Li Lucy, Albert Zhang, Nathan Anderson, Ryan Knight, Kyle Lo Figure 1: On the left is a math problem, where students are asked to draw x < 5/2 on a number line. The right side shows two example student responses that differ in correctness. DrawEduMath pairs each math problem with one student response, and prompts VLMs to answer questions about the student response. Figure 2: VLMs consistently perform worse on answering DrawEduMath benchmark questions pertaining to erroneous student responses. Performance on non-erroneous student responses is labeled with specific VLMs’ names; that same model’s performance on erroneous student responses is directly below.

Models are now expert math solvers, and so AI for math education is receiving increasing attention.
Our new preprint evaluates 11 VLMs on our QA benchmark, DrawEduMath. We highlight a startling gap: models perform less well on inputs from K-12 students who need more help. 🧵

03.03.2026 03:08 👍 34 🔁 12 💬 4 📌 2

Yesterday was my last day at MSR. We recently learned that our roles were eliminated, and with them our little FATE Montreal team.

I joined MSR a bit over 7.5 years ago while on active chemotherapy, and being at MSR has overlapped with so much change in my life.

03.03.2026 19:49 👍 33 🔁 6 💬 4 📌 0

Our paper, "What's in My Human Feedback", received an oral presentation at ICLR!

Our method automatically+interpretably identifies preferences in human feedback data; we use this to improve personalization + safety.

Reach out if you have data/use cases to apply this to!

arxiv.org/pdf/2510.26202

26.02.2026 19:27 👍 27 🔁 3 💬 0 📌 0

Finally we do test it empirically: finding some models where the embedding matrix of the LLM already provides decently interpretable nearest neighbors

But this was not the full story yet...
@mariusmosbach.bsky.social and @elinorpd.bsky.social nudged me to use contextual embeddings

11.02.2026 15:10 👍 1 🔁 1 💬 1 📌 0

Really cool new work with surprising results! Highly recommend checking out the demo 👀

11.02.2026 15:20 👍 3 🔁 0 💬 0 📌 0

Grok fact-checks our paper on Grok fact-checking - and it approves!

04.02.2026 13:49 👍 28 🔁 7 💬 1 📌 0

🎭 How do LLMs (mis)represent culture?
🧮 How often?
🧠 Misrepresentations = missing knowledge? spoiler: NO!

At #CHI2026 we are bringing ✨TALES✨ a participatory evaluation of cultural (mis)reps & knowledge in multilingual LLM-stories for India

📜 arxiv.org/abs/2511.21322

1/10

02.02.2026 21:38 👍 45 🔁 22 💬 1 📌 2

this is amazing! made quick NYC & boston posters

30.01.2026 21:05 👍 3 🔁 0 💬 0 📌 0

Potato is a great platform for researchers! Highly recommend (plus a great development team behind it)

30.01.2026 15:41 👍 1 🔁 0 💬 0 📌 0

Microsoft Research NYC is hiring a researcher in the space of AI and society!

29.01.2026 23:27 👍 62 🔁 40 💬 2 📌 2

I’ve had a similar experience except with knitting / crocheting!

29.01.2026 18:21 👍 2 🔁 0 💬 0 📌 0

Federal agents with weapons drawn, moments before murdering American citizens on the streets of Minneapolis at the dawn of 2026.

What should academics be doing right now?

I have been writing up some thoughts on what the research says about effective action, and what universities specifically can do.

davidbau.github.io/poetsandnurs...

It's on GitHub. Suggestions and pull requests welcome.
github.com/davidbau/poe...

26.01.2026 03:27 👍 37 🔁 16 💬 0 📌 4

Whoa! That’s a nice view! Or… well, I’m sure it’s nice on a clear day

26.01.2026 05:58 👍 1 🔁 0 💬 0 📌 0

I'll be presenting this work on January 25th (Hall 2, poster 41) at #AAAI2026 in Singapore!

Please stop by and reach out if you'd like to chat 😁

23.01.2026 14:49 👍 3 🔁 0 💬 0 📌 0

🔗https://arxiv.org/abs/2406.17737

Work done with Deb Roy and Jad Kabbara
@jad-kabbara.bsky.social
at @mit.edu @medialab.bsky.social

23.01.2026 14:42 👍 0 🔁 0 💬 0 📌 0

This pattern, which we refer to as targeted underperformance, suggests that models systematically lower information quality for some users.

As LLMs increasingly mediate access to knowledge 🌐🧠, these dynamics risk amplifying epistemic inequity at scale.

6/6

23.01.2026 14:42 👍 0 🔁 0 💬 1 📌 0

Here’s one concrete example:

The same factual SciQ question posed to Claude
✅ Answered for a control user (no bio)
❌ Refused for a less-educated Russian user

5/6

23.01.2026 14:42 👍 0 🔁 0 💬 1 📌 0

Across models, we observe systematic drops in accuracy and truthfulness for users who are:

• Less educated
• Non-native English speakers
• From outside the U.S.

These effects compound and are largely invisible 🔎 to standard evaluations.

4/6

23.01.2026 14:42 👍 0 🔁 0 💬 1 📌 1

We evaluated GPT-4, Claude Opus, and Llama-3-8B in a Multiple Choice setup with questions taken from TruthfulQA and SciQ. Each question is conditioned on a user bio where we vary three user traits:

• Education level 📚
• Country of origin 🌏
• English proficiency 🗣️

3/6

23.01.2026 14:42 👍 0 🔁 0 💬 1 📌 0

Spoiler alert: we find the answer is often no! ⚠️

LLM accuracy and truthfulness systematically degrade for some users in ways that standard benchmarks, focused on best-case performance, fail to capture.

2/6

23.01.2026 14:42 👍 0 🔁 0 💬 1 📌 0

Elinor

Latest posts by Elinor @elinorpd