Ivan Kartáč's Avatar

Ivan Kartáč

@ivankartac

PhD student @ Charles University. Working on evaluation, explainability, and reasoning in NLP.

77
Followers
245
Following
3
Posts
30.03.2025
Joined
Posts Following

Latest posts by Ivan Kartáč @ivankartac

How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt This piece by Margaret-Anne Storey is the best explanation of the term cognitive debt I've seen so far. Cognitive debt, a term gaining traction recently, instead communicates the notion that …

Short musings on "cognitive debt" - I'm seeing this in my own work, where excessive unreviewed AI-generated code leads me to lose a firm mental model of what I've built, which then makes it harder to confidently make future decisions simonwillison.net/2026/Feb/15/...

15.02.2026 05:22 👍 465 🔁 88 💬 42 📌 20
Post image

The 5th Generation, Evaluation, and Metrics (GEM) Workshop will be at #ACL2026!

Call for papers is out. Topics include:
🐟 LMs as evaluators
🐠 Living benchmarks
🍣 Eval with humans
and more

New for 2026: Opinion & Statement Papers!

Full CFP: gem-workshop.com/call-for-pap...

27.01.2026 19:17 👍 21 🔁 7 💬 0 📌 1
Preview
On Evaluating Cognitive Capabilities in Machines (and Other "Alien" Intelligences) (Apologies for the length of this post, which means it gets cut off in the email version.

My latest on Substack -- a write-up of the talk I gave at NeurIPS in December.

aiguide.substack.com/p/on-evaluat...

14.01.2026 18:43 👍 122 🔁 36 💬 0 📌 4

OpeNLGauge comes in two variants: a prompt-based ensemble and a smaller fine-tuned model, both built exclusively on open-weight LLMs (including training data!).

Thanks @tuetschek.bsky.social and @mlango.bsky.social!

23.08.2025 16:39 👍 1 🔁 0 💬 0 📌 0

We introduce an explainable metric for evaluating a wide range of natural language generation tasks, without any need for reference texts. Given an evaluation criterion, the metric provides fine-grained assessments of the output by highlighting and explaining problematic spans in the text.

23.08.2025 16:37 👍 0 🔁 0 💬 1 📌 0
Post image

Our paper "OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs" has been accepted to #INLG2025 conference!

You can read the preprint here: arxiv.org/abs/2503.11858

23.08.2025 16:36 👍 4 🔁 2 💬 1 📌 0
Post image Post image

#ACL2025NLP in Vienna 🇦🇹 starts today with 23 🤯 @ufal-cuni.bsky.social folks presenting their work both at the main conference and workshops. Check out our main conference papers today and on Wednesday 👇

28.07.2025 07:27 👍 22 🔁 8 💬 1 📌 1
Preview
Ondrej Dusek MLPrague 2025 Evaluating LLM outputs with humans and LLMs Ondřej Dušek MLPrague 30 April 2025 These slides: https://bit.ly/mlprague25-od

Slides and links to papers at bit.ly/mlprague25-od 🤓

02.05.2025 19:25 👍 2 🔁 2 💬 0 📌 0
Post image

Today, @tuetschek.bsky.social shared the work of his team on evaluating LLM text generation with both human annotation frameworks and LLM-based metrics. Their approach tackles the benchmark data leakage problem and how to get unseen data for unbiased LLM testing.

30.04.2025 12:02 👍 8 🔁 3 💬 1 📌 0
Preview
Large Language Models as Span Annotators Website for the paper Large Language Models as Span Annotators

How do LLMs compare to human crowdworkers in annotating text spans? 🧑🤖

And how can span annotation help us with evaluating texts?

Find out in our new paper: llm-span-annotators.github.io

Arxiv: arxiv.org/abs/2504.08697

15.04.2025 11:10 👍 20 🔁 7 💬 1 📌 2