π’ PhD position in Developmental Language Modelling
(PLZ RT)
What can human language acquisition teach us about training language models? Join us as a PhD!
mpi.nl/career-education/vacancies/vacancy/fully-funded-4-year-phd-position-developmental-language @carorowland.bsky.social
@mpi-nl.bsky.social
10.03.2026 13:12
π 22
π 29
π¬ 1
π 2
Thanks to everyone who gave us feedback: @lampinen.bsky.social, Ellie Pavlick, @glupyan.bsky.social, @phillipisola.bsky.social, and others!
Work with Tianyang Xu, @mudtriangle.com, Karen Livescu, and Greg Shakhnarovich!
10.03.2026 20:53
π 4
π 0
π¬ 0
π 0
This relates more broadly to literature reconciling how meaning obtained from relational grounding in language interacts with that obtained from other forms of grounding (see Mollo and Millere/@raphaelmilliere.com) and lays out a research program on the role of category coherence in learning!
11/
10.03.2026 20:53
π 2
π 0
π¬ 1
π 0
This suggests that representations learned from language are structured so as to expect incoming category information to cohere in a specific way in order to show cross-modal generalization!
10/
10.03.2026 20:53
π 2
π 0
π¬ 1
π 0
Results from counterfactual shuffling experiments. Models tend to generalize equally well when the coherence was preserved and not so well when it was disrupted, even in the absence of all hypernyms.
If models were generalizing arbitrarily, then we shouldnβt see any differences in their performance across these settings (i.e., no matter what, crow == bird). However, we find that models seem to only generalize when the training data preserves category coherence!
9/
10.03.2026 20:53
π 2
π 0
π¬ 1
π 0
Macro F1 scores on unseen images vs. Visual coherence across the 53 hypernym categories for the Qwen3-1.7B backbone (at 100% ablation). r (Pearsonβs correlation) = .43, indicating positive relation.
By coherence we mean the visual similarity between members of the same category, which we calculate using the DINOv2 embeddings used in our VLM training. Even in the original configuration, we found models to perform better on categories that were visually more coherent
8/
10.03.2026 20:53
π 2
π 0
π¬ 1
π 0
Examples of image-leaf mappings resulting from our counterfactual shuffles, in comparison with the original configuration (top). VC indicates the visual coherence of the category under the data configuration. VC for birds in the original set: .30; for within-category shuffles: .30; for across-category shuffle: .12.
To test this, we created counterfactual data: 1) where category-label pairings were shuffled across categories (πͺ= βrobinβ; πΈ= βcrowβ) and 2) where they were shuffled within categories (π¦
=βrobinβ; π¦=βcrowβ). These swaps also manipulate the categoriesβ visual coherence
7/
10.03.2026 20:53
π 2
π 0
π¬ 1
π 0
figure depicting two hypotheses that models might entertain β 1) arbitrary prediction of hypernyms regardless of what the input looks like during supervision; 2) sensitivity to the fact that the category (e.g., birds) is not visually coherent.
Are LMs simply executing something like βIf crow THEN bird?β regardless of what the image shows? E.g., if during supervision we label images of kayaks as βcrowβ would the model still generalize to birds or does the model expect categories to have some level of coherence?
6/
10.03.2026 20:53
π 2
π 0
π¬ 1
π 0
Main results (see fig 4 in the paper). Salient result: models tend to generalize to hypernyms without any evidence encountered during training, suggesting that they show cross-modal generalization.
Having established these preconditions to our task, we then find that models are also able to generalize (non-trivially) to hypernyms without ever having βseenβ them explicitly, suggesting that LM representations support cross-modal generalization!
5/
10.03.2026 20:53
π 2
π 0
π¬ 1
π 0
left: Plot showing that models using the DINOv2 Encoder, which has never seen text information tend to generalize similar to those using the SigLIP encoder, which has seen text information. Right: Table showing that both Qwen3 LMs to demonstrate non-trivial hypernymy knowledge.
We establish that this paradigm works in the first place with a vision encoder that has never been trained on language data (i.e., β SigLIP β
DINO), that the models learn the task on the lower-level categories themselves, and that the LMs indeed have taxonomic knowledge
4/
10.03.2026 20:53
π 3
π 0
π¬ 1
π 0
3 papers on hypernym acquisition in models (Hearst, 1992; Geffet and Dagan, 2005) and humans (Wilson et al., 2023) - see paper for details.
Taxonomic knowledge is interesting because of number of hypotheses about the learnability of category knowledge from linguistic cues, for both computational models and humans. Evidence of cross-modal generalization would lend strong support for these hypotheses!
3/
10.03.2026 20:53
π 2
π 0
π¬ 1
π 0
Figure depicting an instance of our experiments. During training, the projector is deprived of explicit supervision on high-level categories (hypernyms, e.g., animal) at various amounts, and is trained to detect the presence (and absence) of lower-level categories (e.g., koala), keeping the image encoder and the LM backbone frozen. After training, the VLM is tested for generalization to hypernym categories, given previously unseen images.
We use a VLM-training paradigm (frozen vision encoder w/o language training mapped to frozen LM) where we partially supervise on lower level categories during training, and then test if the LM recovers hypernymy knowledge from what it has seen in language data.
2/
10.03.2026 20:53
π 2
π 0
π¬ 1
π 0
title section of the paper: βCross-Modal Taxonomic Generalization in (Vision) Language Modelsβ by Tianyang Xu, Marcelo Sandoval-CastaΓ±eda, Karen Livescu, Greg Shakhnarovich, Kanishka Misra.
What is the interplay between representations learned from (language) surface forms alone, and those learned from more grounded evidence (e.g.,vision)?
Excited to share new work understanding βCross-modal taxonomic generalizationβ in (V)LMs
arxiv.org/abs/2603.07474
1/
10.03.2026 20:53
π 32
π 12
π¬ 1
π 0
I want to unwatch this
10.03.2026 19:42
π 0
π 0
π¬ 1
π 0
@tylerachang.bsky.social and I will be presenting the Goldfish as an oral at #LREC2026 in Mallorca! π΄
09.03.2026 16:35
π 16
π 4
π¬ 1
π 0
π¨New Paper!π¨ How do reasoning LLMs handle inferences that have no deterministic answer? We find that they diverge from humans in some significant ways, and fail to reflect human uncertaintyβ¦ π§΅(1/10)
04.03.2026 16:13
π 55
π 19
π¬ 3
π 1
Check out our special theme: new missions for NLP research!
05.03.2026 22:39
π 12
π 5
π¬ 1
π 1
Whatβs a paper that made you think that way π
05.03.2026 22:32
π 1
π 0
π¬ 1
π 0
I wrote a short article on AI Model Evaluation for the Open Encyclopedia of Cognitive Science ππ
Hope this is helpful for anyone who wants a super broad, beginner-friendly intro to the topic!
Thanks @mcxfrank.bsky.social and @asifamajid.bsky.social for this amazing initiative!
12.02.2026 22:22
π 52
π 22
π¬ 0
π 1
Congratulations Andreas!!
03.03.2026 19:37
π 1
π 0
π¬ 0
π 0
Some days you finish 5 meta-reviews in ~one go, and some days you take 1.5 days to complete one meta-review. Such is the AC life!
03.03.2026 15:35
π 3
π 0
π¬ 0
π 0
Woohoo, will be in touch soon!
03.03.2026 05:17
π 1
π 0
π¬ 0
π 0
Wow!! Good luck with whatever it is you do next β so excited for you!!
03.03.2026 05:17
π 1
π 0
π¬ 1
π 0
Watch slow horses already!!
02.03.2026 17:20
π 0
π 0
π¬ 0
π 0
Japonaise and Jahunger mentioned in same thread π my fav places in Boston!
02.03.2026 17:19
π 0
π 0
π¬ 0
π 0
South by Semantics Workshop:
"New horizons in evaluating pragmatic competence in language models", Jennifer Hu (Johns Hopkins University), March 6, 2026.
I'm looking forward to @jennhu.bsky.social's South by Semantics talk next week at UT Austin! She'll discuss "micro-pragmatics" inferences and world modeling in language models π€
01.03.2026 20:36
π 8
π 2
π¬ 1
π 0
Assistant Teaching Professor in Computational Social Science and Cognitive Science
University of California, San Diego is hiring. Apply now!
Our department is hiring an Assistant Teaching Professor!! This is a joint-appointed position with Computational Social Sciences (css.ucsd.edu). It's 75+ degrees F and sunny today, just thought I'd mention apol-recruit.ucsd.edu/JPF04461
27.02.2026 14:42
π 43
π 28
π¬ 1
π 4
Congratulations Micha!! Iβll be in Amsterdam in the first week of April β Iβd love to connect if youβre around!
27.02.2026 14:17
π 1
π 0
π¬ 1
π 0
Job update: Next week I start as a group leader at the Planck Institute for Psycholinguistics in Nijmegen @mpi-nl.bsky.social π§
Building the Language and Predictive Computation group -- using LLMs to model language in the mind/brain, and vice versa.
Hiring soon!
27.02.2026 10:27
π 54
π 2
π¬ 3
π 0