CMCL deadline extended to Feb 28 AoE!
CMCL deadline extended to Feb 28 AoE!
The submission deadline for CMCL is coming up in less than a month! (Feb 25) CMCL will be co-located with LREC and take place on May 16!π΄https://sites.google.com/view/cmclworkshop/cfp
I had so many inspiring conversations with lovely colleagues and I am already looking forward to visiting again in the future! Many thanks to @simeonjunker.bsky.social, @bbunzeck.bsky.social, @manarali.bsky.social, @hbuschme.bsky.social, Clara Lachenmaier, Lisa Gottschalk, Emilie Sitter, Yu Wang β¨
I have just returned from a week-long visit to Bielefeld University! Thank you very much for hosting me Sina ZarrieΓ and @ozgealacam.bsky.social π @clausebielefeld.bsky.social
This week weβre having @ecekt.bsky.social as our guest in Bielefeld. She gave a highly timely talk on language+vision models, how they process images under noise conditions, and about how to train a highly effective multimodal BabyLM with model merging. π£οΈππ»
Photos from the Computational Psycholinguistics Meeting in Utrecht, many thanks to everyone who joined us in making this a memorable event! β¨
The CfP for CMCL is out!π΄ We are looking forward to receiving many interesting submissions! β¨ (Deadline: February 25, 2026) sites.google.com/view/cmclwor...
which song did they use? money for nothing?
Many thanks to @dnliu.bsky.social for inviting me, and to the members of the group for their insightful questions! πβ¨
The program of the Computational Psycholinguistics Meeting 2025 at Utrecht University is out, packed with a lot of very interesting talks! The registration is full, but there is a waiting list, if you would like to attend⨠cpl2025.sites.uu.nl/schedule/
Cognitive Modeling and Computational Linguistics (CMCL) workshop will be co-located with LREC 2026 in Palma, Mallorca!π΄Stay tuned for more details!β¨
@byungdoh.bsky.social Tatsuki Kuribayashi @grambelli.bsky.social Philipp Wicke, Jixing Li, Ryo Yoshida @cmclworkshop.bsky.social
I was in Sweden this week! πΈπͺβοΈ Many thanks to Nikolai Ilinykh for inviting me to give a talk at the University of Gothenburg. I enjoyed having inspiring chats and delicious food with Sharid LoΓ‘iciga, @asayeed.bsky.social, Simon Dobnik, Hyewon Jang and Chris Howes at CLASP. Much appreciated! ππ
I hope our findings would be helpful for the future contributors to the multimodal track of the BabyLM challenge! aclanthology.org/2025.babylm-...
Instead of using the data provided in the BabyLM challenge, I opted for obtaining them from their sources, which added extra layers of filtering and complexity, revealing some discrepancies in the multimodal BabyLM data. I mention these in the paper.
Unfortunately, we had limited time and resources to modify the whole evaluation pipeline for our specific multimodal architecture. As a result, we tested our models on a subset of the benchmarks.
The report on the Findings of the Third BabyLM Challenge indicates that the multimodal track received only 1 full submission this year. We submitted our paper to the workshop track instead of the challenge.
We experiment with weighted linear interpolation of language-only and multimodal model weights. Model merging with language-only checkpoints helps alleviate the issue to some extent, benefiting performance in language-only benchmarks and not disrupting accuracy in multimodal tasks heavily.
How can we mitigate this issue in developmentally plausible multimodal models and maintain language-only performance? We explored model merging, a technique that has been shown to benefit multi-task and multi-language models, reducing the effects of catastrophic forgetting.
Our multimodal BabyLM model surpasses previous multimodal baselines and submissions on the leaderboard. Yet, compared to language-only models, it underperforms in grammar-oriented benchmarks, although being exposed to the same language-only data as the language-only models (+ multimodal data).
Previous work, including BabyLM contributions, indicates that multimodal data has limited or no benefits in text-only benchmarks. We reach similar conclusions in our low-resource multimodal scenario.
I will be attending EMNLP in China to present our paper with @bylinina.bsky.social (who will be in China, too) and Jakub Dotlacil in the BabyLM workshop! Looking forward to meeting people there! β¨ π #EMNLP2025 @emnlpmeeting.bsky.social
lnkd.in/e-Bzz6De
I felt very much at home at #ICCV2025! Here is the paper: arxiv.org/abs/2509.01453
Just got back from Hawaii, where I presented a workshop paper on image memorability at @iccv.bsky.social πΊ Coming from multimodal NLP, it was my first time attending a CV conference. Everywhere I looked, there were talks and posters that were incredibly interesting!
πIntroducing BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data!
LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data
We extend this effort to 45 new languages!
I will be presenting this work at the @iccv.bsky.social
2025 workshop MemVis: The 1st Workshop on Memory and Vision! πΊ Work done with Albert Gatt & Jakub Dotlacil arxiv.org/abs/2509.01453
What makes an image memorable? And can we predict image memorability using pretrained vision encoders? We explored activations, attention distributions, image patch uniformity and sparse autoencoder losses using image representations across the layers of CLIP, DINOv2 and SigLIP2.
taking the NS train, I do that multiple times a week :)
Could it be the HPLT v3.0 multilingual dataset? list.elra.info/mailman3/hyp...