Mario Sanz

@msanz

PhD student in #NLProc

454
Followers 199
Following 4
Posts 16.11.2024
Joined

Posts Following

Latest posts by Mario Sanz @msanz

Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs When evaluating large language models (LLMs) with multiple-choice question answering (MCQA), it is common to end the prompt with the string "Answer:" to facilitate automated answer extraction via next...

What looks like a trivial formatting choice can actually alter research conclusions, so mind the gap!

Big thanks to my co-authors @minhducbui.bsky.social & Katharina von der Wense!

📄 Read the full paper here: arxiv.org/abs/2509.15020

26.09.2025 09:18 👍 0 🔁 0 💬 1 📌 0

Surprisingly, this small detail:
✅ Shifts model accuracy by up to 11%
✅ Changes which model tops the leaderboard – raising serious concerns about comparability of LLM leaderboards in prior work
✅ Affects calibration (reliability of confidence estimates)

26.09.2025 09:18 👍 1 🔁 0 💬 1 📌 0

In our #EMNLP2025 paper we study how the space before the answer letter (e.g., "A" vs. "␣A") is tokenized.

Practice is currently split: no community-wide standard exists, and even popular evaluation frameworks differ.

26.09.2025 09:18 👍 0 🔁 0 💬 1 📌 0

🧐 Evaluating your LLM with multiple-choice question answering?

🧵 A tiny space in the prompt can make accuracy jump by 11% – and even reshuffle model rankings.

#EMNLP2025 #NLP #AI #LLM #Evaluation

26.09.2025 09:18 👍 2 🔁 0 💬 2 📌 0