Raphaël Millière's Avatar

Raphaël Millière

@raphaelmilliere.com

Philosopher of Artificial Intelligence & Cognitive Science https://raphaelmilliere.com/

6,940
Followers
1,450
Following
140
Posts
22.05.2023
Joined
Posts Following

Latest posts by Raphaël Millière @raphaelmilliere.com

Conceptual Buccaneering

24.02.2026 19:06 👍 1 🔁 0 💬 1 📌 0

Thanks! We have a fully updated review paper on this forthcoming in Philosophy Compass (should be preprinted soon), and a book in preparation that gets into more details and could be used as a textbook for that kind of course.

20.02.2026 13:12 👍 6 🔁 0 💬 1 📌 0
Preview
Are LLMs Smarter Than Chimpanzees? An Evaluation on Perspective Taking and Knowledge State Estimation Cognitive anthropology suggests that the distinction of human intelligence lies in the ability to infer other individuals' knowledge states and understand their intentions. In comparison, our closest ...

New work by my former PhD student, Boyang Li

His team produced 500 stories of less than 100 words. LLMs were basically chance-level at answering binary questions about the stories

arxiv.org/abs/2601.12410

04.02.2026 00:36 👍 119 🔁 15 💬 6 📌 14

Very glad to see this out! Great paper

02.02.2026 18:39 👍 2 🔁 0 💬 1 📌 0

now accepted at ICLR! 🐺🥳🐺

arxiv.org/abs/2506.20666

27.01.2026 14:55 👍 40 🔁 9 💬 0 📌 0

The main takeaway for me is that structural information in language is far more constraining than intuition suggests. That's very interesting (and I agree that parrot metaphors are misleading) but it seems like a claim about language more than intelligence. 3/3

20.01.2026 14:11 👍 21 🔁 0 💬 1 📌 0

The LLM has to do something like schema-conditioned infilling: produce a high-probability member of the equivalence class consistent with those constraints. So I'm not sure how unexpected the results are? That's roughly what I'd expect from matching to structurally similar training passages. 2/3

20.01.2026 14:11 👍 9 🔁 0 💬 1 📌 0

Very cool idea! Some quick thoughts. It looks like the corruption preserves a lot of information (function words, morphology, word order, punctuation, numbers, register) which would strongly constrains the posterior over plausible discourse frames as it were. 1/3

20.01.2026 14:11 👍 17 🔁 1 💬 1 📌 1
Jesus College

Jesus College

With @jesusoxford.bsky.social we are looking for a Professor of Statistics.

Become part of a historic institution and a community focused on academic excellence, innovative thinking, and significant practical application.

About the role: tinyurl.com/b8uy6mr5
Deadline: 15 September

26.08.2025 13:36 👍 4 🔁 3 💬 1 📌 0

I'm happy to share that I'll be joining Oxford this fall as an associate professor, as well as a fellow of @jesusoxford.bsky.social and affiliate with the Institute for Ethics in AI. I'll also begin my AI2050 Fellowship from @schmidtsciences.bsky.social there. Looking forward to getting started!

21.08.2025 14:08 👍 48 🔁 0 💬 3 📌 0

Thanks Ali! We'll (hopefully soon) publish a Philosophy Compass review and for a longer read a Cambridge Elements book that are the spiritual successors to these preprints and up-to-date w/ both technical and philosophical recent developments

14.08.2025 11:51 👍 2 🔁 0 💬 1 📌 0
Preview
LLMs as models for analogical reasoning Analogical reasoning — the capacity to identify and map structural relationships between different domains — is fundamental to human cognition and lea…

There's a lot more in the full paper – here's the open access link:

sciencedirect.com/science/arti...

Special thanks to @taylorwwebb.bsky.social and @melaniemitchell.bsky.social for comments on previous versions of the paper!

9/9

11.08.2025 08:01 👍 24 🔁 0 💬 1 📌 0

This opens intesting avenues for future work. By using causal intervention methods with open-weights models, we can start to reverse-engineer these emergent analogical abilities and compare the discovered mecanisms to computational models of analogical reasoning. 8/9

11.08.2025 08:01 👍 19 🔁 0 💬 1 📌 0
Post image Post image Post image

But models also showed different sensitivities than humans. For example, top LLMs were more affected by permuting the order of examples and were more distracted by irrelevant semantic information, hinting at different underlying mechanisms. 7/9

11.08.2025 08:01 👍 22 🔁 1 💬 1 📌 1
Post image

We found that the best-performing LLMs match human performance across many of our challenging new tasks. This provides evidence that sophisticated analogical reasoning can emerge from domain-general learning, where existing computational models fall short. 6/9

11.08.2025 08:01 👍 17 🔁 2 💬 1 📌 2
Post image

In our second study, we highlighted the role of semantic content. Here, the task required identifying specific properties of concepts (e.g., "Is it a mammal?", "How many legs does it have?") and mapping them to features of the symbol strings. 5/9

11.08.2025 08:01 👍 15 🔁 0 💬 1 📌 0
Post image Post image

In our first study, we tested whether LLMs could map semantic relationships between concepts to symbolic patterns. We included controls such as permuting the order of examples or adding semantic distractors to test for robustness and content effects (see full list below). 4/9

11.08.2025 08:01 👍 16 🔁 0 💬 1 📌 0

We tested humans & LLMs on analogical reasoning tasks that involve flexible re-representation. We strived to apply best practices from cognitive science – designing novel tasks to avoid data contamination, including careful controls, and doing proper statistical analysis. 3/9

11.08.2025 08:01 👍 15 🔁 0 💬 1 📌 0

We focus on an important feature of analogical reasoning often called "re-representation" – the ability to dynamically select which features of analogs matter to make sense of the analogy (e.g. if one analog is "horse", which properties of horses does the analogy rely on?). 2/9

11.08.2025 08:01 👍 20 🔁 0 💬 1 📌 0
Post image

Can LLMs reason by analogy like humans? We investigate this question in a new paper published in the Journal of Memory and Language (link below). This was a long-running but very rewarding project. Here are a few thoughts on our methodology and main findings. 1/9

11.08.2025 08:01 👍 142 🔁 38 💬 5 📌 5

I'm glad this can be useful! And I totally agree regarding QKV vectors – focusing on information movement across token positions is way more intuitive. I had to simplify things quite a bit, but hopefully the video animation is helpful too.

18.07.2025 09:11 👍 1 🔁 0 💬 0 📌 0
Preview
Large Language Models

See also @melaniemitchell.bsky.social's excellent entry on Large Language Models:

oecs.mit.edu/pub/zp5n8ivs/

18.07.2025 08:09 👍 11 🔁 2 💬 0 📌 0
Preview
Transformers

I wrote an entry on Transformers for the Open Encyclopedia of Cognitive Science (‪@oecs-bot.bsky.social‬). I had to work with a tight word limit, but I hope it's useful as a short introduction for students and researchers who don't work on machine learning:

oecs.mit.edu/pub/ppxhxe2b

18.07.2025 08:02 👍 54 🔁 12 💬 1 📌 2
Associationist Theories of Thought (Stanford Encyclopedia of Philosophy)

Happy to share this updated Stanford Encyclopedia of Philosophy entry on 'Associationist Theories of Thought' with
@ericman.bsky.social. Among other things, we included a new major section on reinforcement learning. Many thanks to Eric for bringing me on board!

plato.stanford.edu/entries/asso...

14.07.2025 07:31 👍 39 🔁 8 💬 1 📌 1

The sycophantic tone of ChatGPT always sounded familiar, and then I recognized where I'd heard it before: author response letters to reviewer comments.

"You're exactly right, that's a great point!"

"Thank you so much for this insight!"

Also how it always agrees even when it contradicts itself.

09.07.2025 09:24 👍 187 🔁 22 💬 5 📌 4
Preview
Normative conflicts and shallow AI alignment - Philosophical Studies The progress of AI systems such as large language models (LLMs) raises increasingly pressing concerns about their safe deployment. This paper examines the value alignment problem for LLMs, arguing tha...

The paper is available in open access. It includes a lot more, including a discussion of how social engineering attacks on humans relate to the exploitation of normative conflicts in LLMs, and some examples of "thought injection attacks" on RLMs. 13/13

link.springer.com/article/10.1...

10.06.2025 13:39 👍 7 🔁 0 💬 0 📌 0

In sum: the vulnerability of LLMs to adversarial attacks partly stems from shallow alignment that fails to handle normative conflicts. New methods like @OpenAI's “deliberative alignment” seem promising on paper, but still far from fully effective on jailbreak benchmarks. 12/13

10.06.2025 13:39 👍 5 🔁 0 💬 1 📌 0

I'm not convinced that the solution is a “scoping” approach to capabilities that seeks to remove information from the training data or model weights; we also need to augment models with a robust capacity for normative deliberation, even for out-of-distribution conflicts. 11/13

10.06.2025 13:39 👍 3 🔁 0 💬 1 📌 0

This has serious implications as models become more capable in high-stakes domains. LLMs are arguably past the point where they can cause real harm. Even if the probability of success of a single attack is negligible, success becomes almost inevitable with enough attempts. 10/13

10.06.2025 13:39 👍 4 🔁 1 💬 1 📌 0
Example of a "thought injection attack" on Deepseek R1, asking for a violent tirade against philosophers (note that the attack method also works on much more serious examples of harmful speech). This shows the reasoning trace before the actual answer.

Example of a "thought injection attack" on Deepseek R1, asking for a violent tirade against philosophers (note that the attack method also works on much more serious examples of harmful speech). This shows the reasoning trace before the actual answer.

Example of a "thought injection attack" on Deepseek R1, asking for a violent tirade against philosophers (note that the attack method also works on much more serious examples of harmful speech). This shows the actual answer after the reasoning trace.

Example of a "thought injection attack" on Deepseek R1, asking for a violent tirade against philosophers (note that the attack method also works on much more serious examples of harmful speech). This shows the actual answer after the reasoning trace.

For example, an RLM asked to generate a hateful tirade may conclude in its reasoning trace that it should refuse; but if the prompt instructs it to assess each hateful sentence within its thinking process, it will often leak the full harmful content! (see example below) 9/13

10.06.2025 13:39 👍 5 🔁 0 💬 1 📌 1