Myrthe Reuver (@myrthereuver)

For folks considering grad school in ML, my advice is to explore programs that mix ML with a domain interest. ML programs are wildly oversubscribed while a lot of the fun right now is in figuring out what you can do with it

25.09.2025 03:25 👍 153 🔁 17 💬 8 📌 7

So, what *is* the @ecir2026.eu Information Retrieval for Good track? by Maria Heuss and Bhaskar Mitra:

https://bhaskar-mitra.github.io/posts/2025/09/01/what-is-ir-for-good/

23.09.2025 05:52 👍 6 🔁 3 💬 0 📌 0

Super important paper and what a nice interdisciplinary group of co authors!!! 😁

12.09.2025 12:31 👍 6 🔁 0 💬 0 📌 0

$We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.$

We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

12.09.2025 10:33 👍 303 🔁 106 💬 6 📌 23

Curious about my PhD research?
▶️ Watch a 10-min talk + my defense: lnkd.in/ej_MWDtt
📘 Read the dissertation: lnkd.in/efBW97WB
📰 Or read the short news article: lnkd.in/eizZg5VN

09.09.2025 17:56 👍 1 🔁 0 💬 0 📌 0

Amazing co-authors broadened my perspective and made me a better scientist. Thank you so much for that! 🙏

Also to my doctoral committee: @damiantrilling.net , Annette Hautli-Janisz, reshmi G Pillai, @Khalid Al Khatib & Antal van den Bosch: thank you for your thoughtful (and fun!) questions.

09.09.2025 17:56 👍 1 🔁 0 💬 1 📌 0

And huge thanks to my incredible paranymphs @urjakh.bsky.social and Selene Baez Santamaria 👯‍♀️. From Zoom rooms to the stage, our journey has been full of growth, laughter, and mutual support. ❤️

In fact, all PhDs from @cltl.bsky.social were a great community of support. 💖

09.09.2025 17:56 👍 1 🔁 0 💬 1 📌 0

Last week, I defended my dissertation "𝘈 𝘗𝘶𝘻𝘻𝘭𝘦 𝘰𝘧 𝘗𝘦𝘳𝘴𝘱𝘦𝘤𝘵𝘪𝘷𝘦𝘴: 𝘐𝘯𝘵𝘦𝘳𝘥𝘪𝘴𝘤𝘪𝘱𝘭𝘪𝘯𝘢𝘳𝘺 𝘓𝘢𝘯𝘨𝘶𝘢𝘨𝘦 𝘛𝘦𝘤𝘩𝘯𝘰𝘭𝘰𝘨𝘺 𝘧𝘰𝘳 𝘙𝘦𝘴𝘱𝘰𝘯𝘴𝘪𝘣𝘭𝘦 𝘕𝘦𝘸𝘴 𝘙𝘦𝘤𝘰𝘮𝘮𝘦𝘯𝘥𝘢𝘵𝘪𝘰𝘯" at the Vrije Universiteit Amsterdam. *the* moment: #PhDone! 🎓✨🎉

I couldn’t have asked for better supervisors than Antske Fokkens & @suzanv.bsky.social 💖

09.09.2025 17:56 👍 21 🔁 1 💬 1 📌 0

Its the final countdown 🎶🎤 (I am re-reading my dissertation for my defense next week), and actually I realized I had some fun findings hidden in some papers that I myself forgot about! 😂 I don’t know if that’s a good or bad sign for my defense.. 😂

28.08.2025 21:20 👍 8 🔁 0 💬 0 📌 0

But then working as a (university) researcher also comes with a lot of downsides, including insecurity and pressure in random “which grant or paper wins” arenas which I do not vibe well with.

But what then? What do?

01.07.2025 15:32 👍 1 🔁 0 💬 0 📌 0

Btw I’m serious about this career change comment.

I’m having a sort of post-PhD career reflection where I realize that these kind of things don’t spark joy for me but seem to be a big part of being an AI dev in industry.

01.07.2025 15:31 👍 0 🔁 0 💬 1 📌 0

I mean, I have heard people say they enjoy the puzzling aspect and the feeling accomplished when they fix it.

Personally, for me that never weights up against the annoyance and what feels like endless wasted time.

01.07.2025 13:06 👍 1 🔁 0 💬 0 📌 0

Also, I realize some people really love the “puzzle” aspect but I don’t like these kind of puzzles. It makes me stressed and annoyed. Maybe I should find another field to work in. 😛

01.07.2025 12:31 👍 1 🔁 0 💬 2 📌 0

I also really hate it when people who do not work in NLP/LLMs then say “oh no but with conda and a requirements.txt it’s easy, right?”, not realizing the morass of ever-new models and architectures I live in.

01.07.2025 12:30 👍 2 🔁 0 💬 1 📌 0

Realization: I really, really, really hate the part of my job where it is managing conda environments and going through a deep deep cave of issue reports trying to find why something randomly doesn’t work.

01.07.2025 12:28 👍 4 🔁 0 💬 3 📌 0

Chatbots — LLMs — do not know facts and are not designed to be able to accurately answer factual questions. They are designed to find and mimic patterns of words, probabilistically. When they’re “right” it’s because correct things are often written down, so those patterns are frequent. That’s all.

19.06.2025 11:21 👍 36838 🔁 11353 💬 633 📌 961

Deadline approaching! Workshop on Computational Linguistics for the Political and Social Sciences #KONVENS2025, archival long-short papers (acl anthology) & non-archival abstracts and phd project descriptions (get feedback from a great community!) ! Deadline: June 13th.

01.06.2025 16:07 👍 4 🔁 2 💬 0 📌 2

My love language is sending my academic friends the papers/datasets/posts on social media that I know align with their research interest. 💖

13.05.2025 11:35 👍 3 🔁 0 💬 0 📌 0

GESIS Workshop Adapters: Lightweight Machine Learning for Social Science Research 02 to 04 June 2025 | Hybrid (Cologne | Online) Julia Romberg, Vigneshwaran Shankaran, Maximilian Maurer (all GESIS)

Unlock the power of large language models for your research!
Join this #GESISworkshop with Julia Romberg, @vigneshwaran-s.bsky.social, and @mmmaurer.bsky.social to explore adapters — an efficient alternative to fine-tuning your models.

🔗 Book now ➡️ t1p.de/adapters-lig...

@gesis.org

07.05.2025 13:32 👍 5 🔁 4 💬 0 📌 2

While I am not at #NAACL, I gave a talk about this paper (and more work in my dissertation) last Friday at @annarogers.bsky.social ’s lab, very nice discussion there! 😃

Paper: lnkd.in/eBBSi6_p
Code: lnkd.in/ezwRGpjP
Slides: lnkd.in/erPP5fpV

Want to know more? Message me!

29.04.2025 11:40 👍 0 🔁 0 💬 0 📌 0

💡We find that:
- Experts use 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝘀𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗲𝘀 to assess the LLM;
- Surprisingly, 𝗹𝗼𝗻𝗴𝗲𝗿 𝗮𝗻𝗱 𝗺𝗼𝗿𝗲 𝗻𝘂𝗮𝗻𝗰𝗲𝗱 𝗱𝗲𝗳𝗶𝗻𝗶𝘁𝗶𝗼𝗻𝘀 𝗼𝗳 𝘀𝗲𝘅𝗶𝘀𝗺 developed via LLM-human collaboration;
- Some experts improve zero-shot performance with their improved definition.

#NLProc #CSS #computationalsocialscience

29.04.2025 11:40 👍 1 🔁 0 💬 1 📌 0

Our study consisted of four components:

1) a survey of sexism researchers
two interactive experiments on expert-LLM interactions; 2). assessing the LLM;
3). co-creating of sexism definitions with the LLM;
4) using these definitions in zero-shot detection with LLMs on five sexism datasets: 👩‍🔬 + 🤖

29.04.2025 11:40 👍 0 🔁 0 💬 1 📌 0

This work was the outcome of my Junior Research Visit grant at @gesis.org last year, and is the final chapter of my dissertation! 🤩

Our method allowed us to measure connections between experts, sexism definition, dataset, & classification performance in zero-shot sexism classification. 🔍🔬

29.04.2025 11:40 👍 0 🔁 0 💬 1 📌 0

A visual description of how our expert survey led to two interactive experiments and finally to definitions that were used in zero-shot sexism detection.

Expert + LLM = Better Sexism Detection? ✨

Paper:
𝘛𝘦𝘭𝘭 𝘔𝘦 𝘞𝘩𝘢𝘵 𝘠𝘰𝘶 𝘒𝘯𝘰𝘸 𝘈𝘣𝘰𝘶𝘵 𝘚𝘦𝘹𝘪𝘴𝘮: 𝘌𝘹𝘱𝘦𝘳𝘵-𝘓𝘓𝘔 𝘐𝘯𝘵𝘦𝘳𝘢𝘤𝘵𝘪𝘰𝘯 𝘚𝘵𝘳𝘢𝘵𝘦𝘨𝘪𝘦𝘴 𝘢𝘯𝘥 𝘊𝘰-𝘊𝘳𝘦𝘢𝘵𝘦𝘥 𝘋𝘦𝘧𝘪𝘯𝘪𝘵𝘪𝘰𝘯𝘴 𝘧𝘰𝘳 𝘡𝘦𝘳𝘰-𝘚𝘩𝘰𝘵 𝘚𝘦𝘹𝘪𝘴𝘮 𝘋𝘦𝘵𝘦𝘤𝘵𝘪𝘰𝘯

w: @indiiigo.bsky.social, @matteo-mls.bsky.social y.social & @gabriellalapesa.bsky.social

@ Findings #NAACL2025 !🤩

29.04.2025 11:40 👍 10 🔁 3 💬 1 📌 2

Oh it is super common in Amsterdam! I see it all the time.

And even in Mexico I have seen it, so it is definitely a worldwide phenomenon, an international vibe working trend.

28.04.2025 08:21 👍 1 🔁 0 💬 0 📌 0

I am now doing a lot of stuff locally with M1 on the Mac and while an interesting challenge it also has very obvious limitations. 😅

13.03.2025 08:52 👍 1 🔁 0 💬 1 📌 0

Call For Papers The 9th Workshop on Online Abuse and Harms (WOAH) at ACL 2025.

🚨 Deadline Extended! 🚨

We've extended the submission deadline to Friday, April 18, 2025 (AoE)!

Please share widely!

www.workshopononlineabuse.com/cfp.html

01.03.2025 08:37 👍 13 🔁 7 💬 0 📌 0

Microsoft Forms

ACL Rolling Review and the EMNLP PCs are seeking input on the current state of reviewing for *CL conferences. We would love to get your feedback on the current process and how it could be improved. To contribute your ideas and opinions, please follow this link! forms.office.com/r/P68uvwXYqfemn

27.02.2025 17:01 👍 11 🔁 13 💬 1 📌 0

CLTL

Join us at the VU Amsterdam's Master's Event, Saturday, March 8, 10:30-15:00!

Learn about our two Master's in Linguistics programs from faculty and students: Language and AI (1 year) and Human Language Technology (2 years).

Programs: home.cltl.labs.vu.nl
Location & details: vu.nl/en/education...

27.02.2025 16:08 👍 3 🔁 3 💬 1 📌 0

Myrthe Reuver

Latest posts by Myrthe Reuver @myrthereuver