Adam Davies (@adamdaviesnlp)

Special thanks to my fantastic collaborator and primary author Amogh Mannekote for all his great work in making this paper/project happen!

10.10.2025 15:47 👍 0 🔁 0 💬 0 📌 0

We introduce a framework for evaluating (b), finding that popular models do NOT consistently apply their learned world models when simulating social behavior. The upshot: even when models "know" how people might behave in a given situation, they often fail to apply it in actual simulations!

10.10.2025 15:45 👍 0 🔁 0 💬 1 📌 0

For LLM social simulations to be useful, models must both (a) learn faithful world models re: how various people might realistically behave in different circumstances; and (b) simulate behavior consistent with that world model.

10.10.2025 15:45 👍 0 🔁 0 💬 1 📌 0

Do Role-Playing Agents Practice What They Preach? Belief-Behavior... As large language models (LLMs) are increasingly studied as role-playing agents to generate synthetic data for human behavioral research, ensuring that their outputs remain coherent with their...

With all the attention on "agentic LLM social simulations", how do we know if simulated behaviors are realistic? Come by our poster at the #COLM #SocialSim workshop at noon-1pm to find out! (More details in 🧵, or at openreview.net/forum?id=1BD...)

10.10.2025 15:45 👍 1 🔁 0 💬 1 📌 0

Special thanks to my fantastic collaborators @sewoong-sam-lee.bsky.social, Amogh Mannekote, Marc E. Canby, Julia Hockenmaier, @guohaoli.bsky.social, Kristy Boyer, ChengXiang Zhai, Bonnie J. Dorr, and @frapintoml.bsky.social!

08.10.2025 17:09 👍 1 🔁 1 💬 0 📌 0

Do Role-Playing Agents Practice What They Preach? Belief-Behavior... As large language models (LLMs) are increasingly studied as role-playing agents to generate synthetic data for human behavioral research, ensuring that their outputs remain coherent with their...

Paper 2: Do Role-Playing Agents Practice What They Preach? Belief-Behavior Alignment in LLM-Based Simulations of Human Trust (SocialSim workshop; openreview.net/forum?id=1BD...)

08.10.2025 17:09 👍 1 🔁 0 💬 1 📌 0

Do Role-Playing Agents Practice What They Preach? Belief-Behavior... As large language models (LLMs) are increasingly studied as role-playing agents to generate synthetic data for human behavioral research, ensuring that their outputs remain coherent with their...

Paper 1: Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality (main conference; openreview.net/forum?id=Xhd...)

08.10.2025 17:09 👍 0 🔁 0 💬 1 📌 0

In Montreal at COLM 2025 presenting two papers, DM me if you'd like to chat! Happy to chat all things NLP, interpretability, or cognitive science; and actively looking for Research Scientist roles (graduating May 2026).

08.10.2025 17:09 👍 1 🔁 0 💬 1 📌 0

It was a real pleasure to work with my fantastic collaborators at @oxfordtvg.bsky.social on this project 🤗 already looking forward to our future work in this direction!

#OOD #generalization #LLM #steering #ICML

15.07.2025 07:37 👍 0 🔁 0 💬 0 📌 0

Focus Instruction Tuning (ICML25) Updating LLM instruction tuning with adaptive test-time steerability.

*Come by our poster today to hear more!* 🙉 It’s Tue Jul 15 at 11am-1:30pm (East Exhibition Hall A-B #E-2800) 📍 You can also visit our our project page at tomalamb.github.io/focus-instru... for more details and links 🔗

15.07.2025 07:37 👍 0 🔁 0 💬 1 📌 0

This forces models to learn both (a) explicit relationships between latent features and task behaviors 🎯🙅↔🛠️ and (b) how to dynamically steer generation based on those relationships 🛞🤖

15.07.2025 07:37 👍 0 🔁 0 💬 1 📌 0

The core idea is to train LLMs to generate different responses to the same task instances by conditioning on “focus”/”ignore” instructions 💡

15.07.2025 07:37 👍 0 🔁 0 💬 1 📌 0

Great news — we developed an approach to improve instruction tuning so that the “how”/steering instructions DO work, and it even generalizes to unseen features and tasks! 🎉

15.07.2025 07:37 👍 0 🔁 0 💬 1 📌 0

This means it’s ineffective to simply ask models to focus on the “right” (causal 🎯) features and ignore the “wrong” (spurious/biased 🙅) ones, which can lead to poor generalization and biased behaviors 😬 Wouldn’t it be cool if that DID work, though? 🤔

15.07.2025 07:37 👍 0 🔁 0 💬 1 📌 0

Traditional instruction tuning teaches LLMs to perform open-ended tasks given text instructions 💬🤖🛠️ But standard techniques are ineffective for controlling (steering 🛞) HOW models should perform the task

15.07.2025 07:37 👍 0 🔁 0 💬 1 📌 0

📄👋 #ICML2025 paper paper presentation TODAY (Tue morning): Focus Instruction Tuning — updating LLM instruction tuning with adaptive test-time steerability 🤖🛞

🧵

15.07.2025 07:37 👍 1 🔁 0 💬 1 📌 0

Come by our lightning talk at 3:40pm or our poster session at 4pm to hear more 🙉 (both are located in the East Ballroom A/B). Hope to see you there!

15.12.2024 22:44 👍 0 🔁 0 💬 0 📌 0

Measuring the Reliability of Causal Probing Methods: Tradeoffs... Causal probing aims to analyze foundation models by examining how intervening on their representation of various latent properties impacts their outputs. Recent works have cast doubt on the...

But interpretability methods can sometimes be unreliable 🔬👎 In our second paper (openreview.net/forum?id=tmp...), we define and measure their reliability, finding that concept removal methods are unreliable and counterfactual methods have key tradeoffs between different experimental goals

15.12.2024 22:44 👍 2 🔁 1 💬 1 📌 0

Competence-Based Analysis of Language Models Despite the recent successes of large, pretrained neural language models (LLMs), comparatively little is known about the representations of linguistic structure they learn during pretraining, which...

Models fail to generalize under distribution shift if they rely on spurious features 📉🙅 In CALM (openreview.net/forum?id=x6Z...), we study whether models rely more on spurious or causal features for a range of tasks -- TLDR: they do both, leading to high performance ceilings but low floors!

15.12.2024 22:44 👍 1 🔁 1 💬 1 📌 0

How can we interpret what features LLMs use to perform a given task? 🤖💭 And how do we know if our interpretation is correct? 🤔🔬

Excited to be presenting 2 papers + oral on these questions in the #InterpretableAI workshop at #neurips2024 📢 -- come by our posters/talk to hear more!

15.12.2024 22:44 👍 0 🔁 0 💬 1 📌 0

Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models We introduce IllusionBench, a dataset that challenges current cutting-edge VLMs to decipher shape information when the shape is represented by an arrangement of visual elements in a scene.

Check out our project page arshiahemmat.github.io/illusionbench/

13.12.2024 00:33 👍 0 🔁 0 💬 0 📌 0

Special thanks to my fabulous co-authors Arshia Hemmat, Tom Lamb, @dydyydyyyd.bsky.social, Phil Torr, Ashkan Khakzar, and @frapintoml.bsky.social -- loved working with you all, and can't wait for our next paper! 🚀

13.12.2024 00:23 👍 0 🔁 0 💬 0 📌 0

I'm excited to be presenting our paper -- Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models -- today at NeurIPS (West Ballroom A-D, Poster 5202). Hope to see you there!

13.12.2024 00:23 👍 0 🔁 0 💬 1 📌 0

Shape perception is fundamental to human vision 👁️🔷 but years of research on shape vs texture bias has relied on benchmarks that are simplistic relative to today's best VLMs 🤖🧠 It's time for a new dataset generated with methods as powerful as the models we're testing! 🦾

13.12.2024 00:23 👍 0 🔁 0 💬 1 📌 0

Introducing 🪄 IllusionBench 🎩 our multimodal shape recognition benchmark at #NeurIPS2024

🎯 Can vision-language models recognize these shapes? (❌ nope!)

13.12.2024 00:23 👍 5 🔁 0 💬 2 📌 0

Adam Davies

Latest posts by Adam Davies @adamdaviesnlp