micha heilbron (@mheilbron)

📢 PhD position in Developmental Language Modelling
(PLZ RT)

What can human language acquisition teach us about training language models? Join us as a PhD!
mpi.nl/career-education/vacancies/vacancy/fully-funded-4-year-phd-position-developmental-language @carorowland.bsky.social
@mpi-nl.bsky.social

10.03.2026 13:12 👍 14 🔁 19 💬 1 📌 2

Fully Funded 4-Year PhD Position In Developmental Language Modelling | Max Planck Institute

mpi.nl/career-education/vacancies/vacancy/fully-funded-4-year-phd-position-developmental-language

10.03.2026 13:12 👍 0 🔁 0 💬 0 📌 0

📢 PhD position in Developmental Language Modelling
(PLZ RT)

What can human language acquisition teach us about training language models? Join us as a PhD!
mpi.nl/career-education/vacancies/vacancy/fully-funded-4-year-phd-position-developmental-language @carorowland.bsky.social
@mpi-nl.bsky.social

10.03.2026 13:12 👍 14 🔁 19 💬 1 📌 2

📢 PhD position in the NeuroAI of Language

Why can LLMs predict brain activity so well? We're hiring a PhD student to find out -- AI interpretability meets neuroimaging
Deadline March 20
Please RT 🙏
👇
mpi.nl/career-education/vacancies/vacancy/fully-funded-4-year-phd-position-neuroai-language

05.03.2026 13:34 👍 48 🔁 37 💬 2 📌 1

yes i will be around -- let's do it!

27.02.2026 16:40 👍 1 🔁 0 💬 1 📌 0

(I'll keep a part-time affiliation with the @uva.nl as Assistant Professor of Cognitive AI, continuing to teach all things AI and the brain/mind, so I'll still be around in Amsterdam)

27.02.2026 10:31 👍 6 🔁 0 💬 0 📌 0

Job update: Next week I start as a group leader at the Planck Institute for Psycholinguistics in Nijmegen @mpi-nl.bsky.social 🧠

Building the Language and Predictive Computation group -- using LLMs to model language in the mind/brain, and vice versa.

Hiring soon!

27.02.2026 10:27 👍 53 🔁 2 💬 3 📌 0

Memorization vs. generalization in deep learning: implicit biases, benign overfitting, and more Or: how I learned to stop worrying and love the memorization

What is the relationship between memorization and generalization in AI? Is there a fundamental tradeoff? In infinitefaculty.substack.com/p/memorizati... I’ve reviewed some of the evolving perspectives on memorization & generalization in machine learning, from classic perspectives through LLMs.

18.02.2026 15:54 👍 134 🔁 27 💬 4 📌 5

Interesting convergence:

The trick that made predictive self-supervised vision models work seems to be what the brain was doing all along

w/ @predictivebrain.bsky.social: visual cortex is most sensitive to high-level prediction errors -- even in V1

Now published:
journals.plos.org/ploscompbiol...

03.02.2026 10:35 👍 23 🔁 5 💬 0 📌 0

This paper had a pretty shocking headline result (40% of voxels!), so I dug into it, and I think it is wrong. Essentially: they compare two noisy measures and find that about 40% of voxels have different sign between the two. I think this is just noise!

05.01.2026 17:22 👍 238 🔁 99 💬 8 📌 9

so nice to see this out sush!!

19.11.2025 08:47 👍 1 🔁 0 💬 1 📌 0

Predicting upcoming visual features during eye movements yields scene representations aligned with human visual cortex Scenes are complex, yet structured collections of parts, including objects and surfaces, that exhibit spatial and semantic relations to one another. An effective visual system therefore needs unified ...

🚨New Preprint!
How can we model natural scene representations in visual cortex? A solution is in active vision: predict the features of the next glimpse! arxiv.org/abs/2511.12715

+ @adriendoerig.bsky.social , @alexanderkroner.bsky.social , @carmenamme.bsky.social , @timkietzmann.bsky.social
🧵 1/14

18.11.2025 12:34 👍 86 🔁 28 💬 3 📌 5

archive.ph/smEj0 (or, unpaywalled 🤫)

07.11.2025 10:32 👍 2 🔁 0 💬 0 📌 0

The Case That A.I. Is Thinking ChatGPT does not have an inner life. Yet it seems to know what it’s talking about.

This is, without a doubt, the best popular article about current state of AI. And on whether LLMs are truly 'thinking' or 'understanding' -- and what that question even means

www.newyorker.com/magazine/202...

07.11.2025 10:32 👍 5 🔁 0 💬 1 📌 0

omg. what journal? name and shame

19.09.2025 12:34 👍 0 🔁 0 💬 0 📌 0

huh! if these effects are similar and consistent, I think it should work, but the q. is how do you get a vector representation for novel pseudowords? we currently use lexicosemantic word vectors and they are undefined for novel words.

so how to represent the novel words? v. interesting test case

19.09.2025 12:32 👍 0 🔁 0 💬 0 📌 0

@nicolecrust.bsky.social might be of interest

18.09.2025 11:52 👍 0 🔁 0 💬 0 📌 0

New paper on memorability, with @davogelsang.bsky.social !

18.09.2025 10:45 👍 12 🔁 0 💬 0 📌 0

Representational magnitude as a geometric signature of image and word memorability What makes some stimuli more memorable than others? While memory varies across individuals, research shows that some items are intrinsically more memorable, a property quantifiable as “memorability”. ...

New preprint out together with @mheilbron.bsky.social

We find that a stimulus' representational magnitude—the L2 norm of its DNN representation—predicts intrinsic memorability not just for images, but for words too.
www.biorxiv.org/content/10.1...

18.09.2025 09:53 👍 25 🔁 6 💬 4 📌 1

Together, our results support a classic idea: cognitive limitations can be a powerful inductive bias for learning

Yet they also reveal a curious distinction: a model with more human-like *constraints* is not necessarily more human-like in its predictions

18.08.2025 12:40 👍 1 🔁 0 💬 0 📌 0

This paradox – better language models yielding worse behavioural predictions – could not be explained by prior explanations: The mechanism appears distinct from those linked to superhuman training scale or memorisation

18.08.2025 12:40 👍 1 🔁 0 💬 1 📌 0

However, we then used these models to predict human behaviour

Strikingly these same models that were demonstrably better at the language task, were worse at predicting human reading behaviour

18.08.2025 12:40 👍 0 🔁 0 💬 1 📌 0

The benefit was robust

Fleeting memory models achieved better next-token prediction (lower loss) and better syntactic knowledge (higher accuracy) on the BLiMP benchmark

This was consistent across seeds and for both 10M and 100M training sets

18.08.2025 12:40 👍 0 🔁 0 💬 1 📌 0

But we noticed this naive decay was too strong

Human memory has a brief 'echoic' buffer that perfectly preserves the immediate past. When we added this – a short window of perfect retention before the decay -- the pattern flipped

Now, fleeting memory *helped* (lower loss)

18.08.2025 12:40 👍 0 🔁 0 💬 1 📌 0

Our first attempt, a "naive" memory decay starting from the most recent word, actually *impaired* language learning. Models with this decay had higher validation loss, and this worsened (even higher loss) as the decay became stronger

18.08.2025 12:40 👍 0 🔁 0 💬 1 📌 0

To test this in a modern context, we propose the ‘fleeting memory transformer’

We applied a power-law memory decay to the self-attention scores, simulating how access to past words fades over time, and ran controlled experiments on the developmentally realistic BabyLM corpus

18.08.2025 12:40 👍 0 🔁 0 💬 1 📌 0

However, this appears difficult to reconcile with the success of transformers, which can learn language very effectively, despite lacking working memory limitations or other recency biases

Would the blessing of fleeting memory still hold in transformer language models?

18.08.2025 12:40 👍 0 🔁 0 💬 1 📌 0

A core idea in cognitive science is that the fleetingness of working memory isn't a flaw

It may actually help at learning language by forcing a focus on the recent past and providing an incentive to discover abstract structure rather than surface details

18.08.2025 12:40 👍 2 🔁 0 💬 1 📌 0

Human-like fleeting memory improves language learning but impairs reading time prediction in transformer language models Human memory is fleeting. As words are processed, the exact wordforms that make up incoming sentences are rapidly lost. Cognitive scientists have long believed that this limitation of memory may, para...

New preprint! w/@drhanjones.bsky.social

Adding human-like memory limitations to transformers improves language learning, but impairs reading time prediction

This supports ideas from cognitive science but complicates the link between architecture and behavioural prediction
arxiv.org/abs/2508.05803

18.08.2025 12:40 👍 11 🔁 2 💬 1 📌 0

Poster Presentation

On Wednesday, Maithe van Noort will present a poster on “Compositional Meaning in Vision-Language Models and the Brain”

First results from a much larger project on visual and linguistic meaning in brains and machines, with many collaborators -- more to come!  
t.ly/TWsyT

12.08.2025 11:14 👍 0 🔁 0 💬 0 📌 0

micha heilbron

Latest posts by micha heilbron @mheilbron