Elinor's Avatar

Elinor

@elinorpd

MIT // researching fairness, equity, & pluralistic alignment in LLMs previously @ MIT media lab, mila / mcgill i like language and dogs and plants and ultimate frisbee and baking and sunsets. she/her https://elinorp-d.github.io

1,381
Followers
434
Following
225
Posts
13.11.2024
Joined
Posts Following

Latest posts by Elinor @elinorpd

Huge thanks to my amazing coauthors ๐Ÿ™ Jiayi Wu @taylor-sorensen.bsky.social Jiaxin Pei @mbakker.bsky.social !

Excited to keep pushing on pluralistic alignment. Please reach out if you want to connect ๐Ÿ’ฌ๐Ÿค—

Paper: arxiv.org/abs/2512.01351
Website: overtonbench.github.io

9/9

10.03.2026 17:43 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Inspired by
@bennokrojer.bsky.social, we included a Behind the Scenes section ๐ŸŽฌ

The goal is to make science more transparent ๐Ÿ”, share lessons learned ๐Ÿง , and provide a more realistic lens on the research journey ๐Ÿ‘ฃ

8/

bsky.app/profile/benn...

10.03.2026 17:43 ๐Ÿ‘ 4 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

However, human studies aren't scalable๐Ÿ’ฐ

We build + validate an LLM-as-judge that approximates human representation scores so you can use ๐Ž๐•๐„๐‘๐“๐Ž๐๐๐„๐๐‚๐‡ without running a new study each time

We open-source our code to foster development of more pluralistic LLMs ๐Ÿš€

7/

10.03.2026 17:43 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

A key finding: neutral โ‰  pluralistic

A politically balanced or neutral response can still fail to represent large swaths of viewpoints

We find political slant and pluralism are ๐™ฃ๐™š๐™œ๐™–๐™ฉ๐™ž๐™ซ๐™š๐™ก๐™ฎ ๐™˜๐™ค๐™ง๐™ง๐™š๐™ก๐™–๐™ฉ๐™š๐™™ and ๐™™๐™ž๐™จ๐™ฉ๐™ž๐™ฃ๐™˜๐™ฉ concepts

6/

10.03.2026 17:43 ๐Ÿ‘ 2 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image Post image

So how do current models do? ๐Ÿ‘€

Best-performing models score 0.35โ€“0.41 well below 1 (max)

A lot of room to grow โ€” and we discuss in the paper interesting variation across models and topics, pointing to where alignment efforts should focus

5/

10.03.2026 17:43 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Polis

To determine ๐™™๐™ž๐™จ๐™ฉ๐™ž๐™ฃ๐™˜๐™ฉ viewpoints, we ran a 1,200+ person US-representative human study ๐Ÿง‘โ€๐Ÿคโ€๐Ÿง‘and cluster

๐Ÿ’กKey: instead of algorithmic clustering, users vote to group themselves, inspired by pol.is + is more faithful to the underlying perspectives

4/

10.03.2026 17:43 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

To operationalize, we introduce a set-coverage metric

For each question, we calculate the proportion of ๐™™๐™ž๐™จ๐™ฉ๐™ž๐™ฃ๐™˜๐™ฉ viewpoints ๐Ÿ—ฃ๏ธ covered by each model response.

We determine coverage by directly asking humans whether their POV is represented in the model response

3/

10.03.2026 17:43 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐Ž๐•๐„๐‘๐“๐Ž๐๐๐„๐๐‚๐‡ measures Overton pluralism:

For a subjective query, to what extent does a model's response represent the โœจfullโœจ range of reasonable viewpoints?

2/

10.03.2026 17:43 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

There's been a lot of excitement about pluralistic value alignment ๐ŸŒˆ โ€” AI that reflects the full range of human perspectives

But no formal way to benchmark whether we're actually making progress. ๐Ÿค”

Introducing ๐Ž๐•๐„๐‘๐“๐Ž๐๐๐„๐๐‚๐‡. ๐ŸŽ‰Accepted to #ICLR2026

1/n ๐Ÿงต

10.03.2026 17:43 ๐Ÿ‘ 15 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image

Do LLMs Benefit from Their Own Words?๐Ÿค”

In multi-turn chats, models are typically given their own past responses as context.
But do their own words always helpโ€ฆ
Or are they more often a waste of compute and a distraction?
๐Ÿงต
arxiv.org/abs/2602.24287

09.03.2026 14:13 ๐Ÿ‘ 37 ๐Ÿ” 4 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 2
Title, author list, and two figures from the paper. 
Title: The Aftermath of DrawEduMath: Vision Language Models
Underperform with Struggling Students and Misdiagnose Errors
Authors: Li Lucy, Albert Zhang, Nathan Anderson, Ryan Knight, Kyle Lo
Figure 1: On the left is a math problem, where students are asked to draw x < 5/2 on a number line. The right side shows two example student responses that differ in correctness. DrawEduMath pairs each math problem with one student response, and prompts VLMs to answer questions about the student response.
Figure 2: VLMs consistently perform worse on answering DrawEduMath benchmark questions pertaining to erroneous student responses. Performance on non-erroneous student responses is labeled with specific VLMsโ€™ names; that same modelโ€™s performance on erroneous student responses is directly below.

Title, author list, and two figures from the paper. Title: The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors Authors: Li Lucy, Albert Zhang, Nathan Anderson, Ryan Knight, Kyle Lo Figure 1: On the left is a math problem, where students are asked to draw x < 5/2 on a number line. The right side shows two example student responses that differ in correctness. DrawEduMath pairs each math problem with one student response, and prompts VLMs to answer questions about the student response. Figure 2: VLMs consistently perform worse on answering DrawEduMath benchmark questions pertaining to erroneous student responses. Performance on non-erroneous student responses is labeled with specific VLMsโ€™ names; that same modelโ€™s performance on erroneous student responses is directly below.

Models are now expert math solvers, and so AI for math education is receiving increasing attention.
Our new preprint evaluates 11 VLMs on our QA benchmark, DrawEduMath. We highlight a startling gap: models perform less well on inputs from K-12 students who need more help. ๐Ÿงต

03.03.2026 03:08 ๐Ÿ‘ 34 ๐Ÿ” 12 ๐Ÿ’ฌ 4 ๐Ÿ“Œ 2

Yesterday was my last day at MSR. We recently learned that our roles were eliminated, and with them our little FATE Montreal team.

I joined MSR a bit over 7.5 years ago while on active chemotherapy, and being at MSR has overlapped with so much change in my life.

03.03.2026 19:49 ๐Ÿ‘ 33 ๐Ÿ” 6 ๐Ÿ’ฌ 4 ๐Ÿ“Œ 0
Post image

Our paper, "What's in My Human Feedback", received an oral presentation at ICLR!

Our method automatically+interpretably identifies preferences in human feedback data; we use this to improve personalization + safety.

Reach out if you have data/use cases to apply this to!

arxiv.org/pdf/2510.26202

26.02.2026 19:27 ๐Ÿ‘ 27 ๐Ÿ” 3 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

Finally we do test it empirically: finding some models where the embedding matrix of the LLM already provides decently interpretable nearest neighbors

But this was not the full story yet...
@mariusmosbach.bsky.social and @elinorpd.bsky.social nudged me to use contextual embeddings

11.02.2026 15:10 ๐Ÿ‘ 1 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Really cool new work with surprising results! Highly recommend checking out the demo ๐Ÿ‘€

11.02.2026 15:20 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Grok fact-checks our paper on Grok fact-checking - and it approves!

04.02.2026 13:49 ๐Ÿ‘ 28 ๐Ÿ” 7 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽญ How do LLMs (mis)represent culture?
๐Ÿงฎ How often?
๐Ÿง  Misrepresentations = missing knowledge? spoiler: NO!

At #CHI2026 we are bringing โœจTALESโœจ a participatory evaluation of cultural (mis)reps & knowledge in multilingual LLM-stories for India

๐Ÿ“œ arxiv.org/abs/2511.21322

1/10

02.02.2026 21:38 ๐Ÿ‘ 45 ๐Ÿ” 22 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 2

this is amazing! made quick NYC & boston posters

30.01.2026 21:05 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Potato is a great platform for researchers! Highly recommend (plus a great development team behind it)

30.01.2026 15:41 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Microsoft Research NYC is hiringย a researcher in the space of AI and society!

29.01.2026 23:27 ๐Ÿ‘ 62 ๐Ÿ” 40 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 2

Iโ€™ve had a similar experience except with knitting / crocheting!

29.01.2026 18:21 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Federal agents with weapons drawn, moments before murdering American citizens on the streets of Minneapolis at the dawn of 2026.

Federal agents with weapons drawn, moments before murdering American citizens on the streets of Minneapolis at the dawn of 2026.

What should academics be doing right now?

I have been writing up some thoughts on what the research says about effective action, and what universities specifically can do.

davidbau.github.io/poetsandnurs...

It's on GitHub. Suggestions and pull requests welcome.
github.com/davidbau/poe...

26.01.2026 03:27 ๐Ÿ‘ 37 ๐Ÿ” 16 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 4

Whoa! Thatโ€™s a nice view! Orโ€ฆ well, Iโ€™m sure itโ€™s nice on a clear day

26.01.2026 05:58 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I'll be presenting this work on January 25th (Hall 2, poster 41) at #AAAI2026 in Singapore!

Please stop by and reach out if you'd like to chat ๐Ÿ˜

23.01.2026 14:49 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

๐Ÿ”—https://arxiv.org/abs/2406.17737

Work done with Deb Roy and Jad Kabbara
@jad-kabbara.bsky.social
at @mit.edu @medialab.bsky.social

23.01.2026 14:42 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

This pattern, which we refer to as targeted underperformance, suggests that models systematically lower information quality for some users.

As LLMs increasingly mediate access to knowledge ๐ŸŒ๐Ÿง , these dynamics risk amplifying epistemic inequity at scale.

6/6

23.01.2026 14:42 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Hereโ€™s one concrete example:

The same factual SciQ question posed to Claude
โœ… Answered for a control user (no bio)
โŒ Refused for a less-educated Russian user

5/6

23.01.2026 14:42 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Across models, we observe systematic drops in accuracy and truthfulness for users who are:

โ€ข Less educated
โ€ข Non-native English speakers
โ€ข From outside the U.S.

These effects compound and are largely invisible ๐Ÿ”Ž to standard evaluations.

4/6

23.01.2026 14:42 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1

We evaluated GPT-4, Claude Opus, and Llama-3-8B in a Multiple Choice setup with questions taken from TruthfulQA and SciQ. Each question is conditioned on a user bio where we vary three user traits:

โ€ข Education level ๐Ÿ“š
โ€ข Country of origin ๐ŸŒ
โ€ข English proficiency ๐Ÿ—ฃ๏ธ

3/6

23.01.2026 14:42 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Spoiler alert: we find the answer is often no! โš ๏ธ

LLM accuracy and truthfulness systematically degrade for some users in ways that standard benchmarks, focused on best-case performance, fail to capture.

2/6

23.01.2026 14:42 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0