(@ehudreiter) — KonKok

I start my last-ever course today, MSc course on Natural Language Generation. My last lecture (on NLG evaluation) will be on 20 April. Hard to believe...

09.03.2026 09:10 👍 0 🔁 1 💬 0 📌 0

Questions from readers of my book A group who is reading my book sent me many questions, some of which we discussed in a call last week. I thought I would share the questions and my responses.

New blog: Questions from readers of my book

A group who is reading my book sent me many questions, some of which we discussed in a call last week. I thought I would share the questions and my responses.

ehudreiter.com/2026/03/03/q...

03.03.2026 09:18 👍 1 🔁 2 💬 0 📌 0

Great to see that my student Jawwad Baig has submitted his PhD! One of my main goals for 2025-26 is to help 6 PhD students submit before I retire. Halfway through the academic year, and three of the six have now submitted, so on track.

02.03.2026 09:24 👍 1 🔁 0 💬 0 📌 0

Join the some NLG people Discord Server! Check out the some NLG people community on Discord - hang out with 232 other members and enjoy free voice and text chat.

If you're not on the SIGGEN mailing list or in the NLG Discord server, you might not have seen that Barkavi Sundarajan has been leading a reading group about @ehudreiter.bsky.social's new book "Natural Language Generation".

Join us Friday, 27 Feb, at 2pm UK time: discord.gg/hysgkK7Q?eve...

25.02.2026 14:14 👍 2 🔁 2 💬 1 📌 0

My PhD student Adarsa Sivaprasad is looking for people who have lived experience of IVF to help evaluate an AI chatbot which explains IVF outcome predictions.

What is involved: 45 min online MS Teams call.
Read details and sign up at: tinyurl.com/cc2aepf5

25.02.2026 09:20 👍 0 🔁 0 💬 0 📌 0

Id love to see old friends and meet new colleaugues at my retirement symposium!

23.02.2026 09:27 👍 0 🔁 1 💬 0 📌 0

Dont ignore omissions! Most semantic evaluation of LLMs focuses on accuracy and hallucination. These are very important, but it is also important to look at completeness and omission; does the generated text include all …

New blog: Dont ignore omissions!

Evaluation of LLMs focuses on accuracy and hallucination. Completeness and omission also important; does the text include all the key information? Omissions are a huge problem in medical NLG, and in other NLG tasks as well.

ehudreiter.com/2026/02/11/d...

11.02.2026 09:45 👍 5 🔁 2 💬 0 📌 0

Reliability of LLMs as medical assistants for the general public: a randomized preregistered study - Nature Medicine In a randomized controlled study involving 1,298 participants from a general sample, performance of humans when assisted by a large language model (LLM) was sensibly inferior to that of the LLM alone ...

Fascinating paper on problems using LLMs to respond to health queries. LLMs do well on standard medical benchmarks but struggle (for example) to understand symptoms presented in a confused way

www.nature.com/articles/s41...

10.02.2026 16:40 👍 2 🔁 0 💬 0 📌 0

Newer AI Coding Assistants Are Failing in Insidious Ways One AI coding assistant power user says the tools are hitting a plateau, and some are even declining. What's causing this unexpected twist in tech?

Really interesting paper on AI coding assistants, which may be getting worse becuase of lower quality training data

spectrum.ieee.org/ai-coding-de...

10.02.2026 15:40 👍 1 🔁 0 💬 0 📌 0

This is the first conference which my daughter Naomi (who is doing a PhD in medieval history) has helped to organise. Good luck to her and fellow organisers!

06.02.2026 13:04 👍 5 🔁 0 💬 0 📌 1

Nice talk by my colleague Jakub Zbrzeżny (Aberdeen Divinity Dept) on using LLM to translate biblical texts into and out of local Arabic dialect (n Hebron). Basically LLMs understand dialect, but cannot produce it. Will this encourage more young people to abandon their dialect?

05.02.2026 09:37 👍 0 🔁 0 💬 0 📌 0

Very excited to have a retirement symposium on NLG evaluation! Looking forward to seeing old friends and meeting new people!

05.02.2026 09:29 👍 2 🔁 0 💬 0 📌 0

Friend asked what I am focusing on in last half-year before retirement. Largely getting 5 PhD students to submit PhDs and have vivas. Fortunately all seem to be on track!

04.02.2026 09:45 👍 1 🔁 0 💬 0 📌 0

Do a sanity check on your experiments I strongly recommend that researchers do “sanity checks” on data, model outputs, and evaluation results, looking for anomalies. This can help detect data errors, model cheating, softwar…

Wrote blog recently that authors should do sanity checks on papers. Readers should as well! Recently read interesting paper, but sanity check showed claims in paper did not match data. Either paper hallucinated or authors sloppy.

ehudreiter.com/2025/12/22/d...

03.02.2026 09:26 👍 0 🔁 0 💬 0 📌 0

Redirecting

Interesting paper about different techniques to eval performance of medical decision support. Concludes that F1 is the worst technique, shame that is so heavily used in NLP and AOI..

doi.org/10.1016/j.la...

02.02.2026 16:00 👍 0 🔁 1 💬 0 📌 0

I liked recent Economist article "How to avoid common AI pitfalls in the workplace" (paywall). Little mention of benchmarks or AGI (but does mention contrast between rising benchmark scores and limited real-world impact). Instead focuses on pragmatic issues such as workflow.

02.02.2026 10:00 👍 0 🔁 0 💬 0 📌 0

The 5th Generation, Evaluation, and Metrics (GEM) Workshop will be at #ACL2026!

Call for papers is out. Topics include:
🐟 LMs as evaluators
🐠 Living benchmarks
🍣 Eval with humans
and more

New for 2026: Opinion & Statement Papers!

Full CFP: gem-workshop.com/call-for-pap...

27.01.2026 19:17 👍 21 🔁 7 💬 0 📌 1

My Eureka moments in research The most exciting and rewarding moments of my research career were when I discovered something new and exciting about NLG, language, etc. I describe a few of these “Eureka” moments. I h…

New blog: My Eureka moments in research

The most exciting moments of my career were discovering something new and exciting about NLG, etc. I describe a few of my “Eureka” moments. They are what I remember best, much more than acceptance of papers.

ehudreiter.com/2026/01/30/m...

30.01.2026 09:02 👍 2 🔁 0 💬 0 📌 0

Publications Ehud Reiter’s Publications Publication profile in Google Scholar Books Journal Papers Book Chapters, Conference Papers and Workshop Papers Patents Other Publications Note: This lists all pape…

Realised that a lot of the URLs for older papers do not work in my publications page (ehudreiter.com/publications/) . For journal papers, I am replacing dead links with DOIs, and will use only DOIs in future!

29.01.2026 10:46 👍 1 🔁 0 💬 0 📌 0

Lets use AI to help people manage illness I am excited by the idea of using AI to help people manage ilness and health conditions. This isnt very sexy, but I think there is real potential to improve health outcomes and quality of life.

New blog: Lets use AI to help people manage illness

I am excited by the idea of using AI to help people manage ilness and health conditions. This isnt very sexy, but I think there is real potential to improve health outcomes and quality of life.

ehudreiter.com/2026/01/19/l...

19.01.2026 09:22 👍 0 🔁 0 💬 0 📌 0

Other CS academics I know have done very different things in retirement: remained active in academia as emeritus, joined a startup, charitable work, moved to remote spot in Scot Highlands, write novels, etc. We did similar things as academics (research and teaching), but very diff in retirement!

16.01.2026 09:16 👍 2 🔁 0 💬 0 📌 0

West Midlands police chief apologises after AI error used to justify Maccabi Tel Aviv ban Craig Guildford says he gave incorrect evidence to MPs and mistake arose from ‘use of Microsoft Copilot’

AI hallucination is in the UK political news. Israeli fans were banned from a football match, and this ban was based on a report which included hallucinated material made up by MS Copilot

www.theguardian.com/uk-news/2026...

14.01.2026 15:20 👍 0 🔁 0 💬 0 📌 0

‘Dangerous and alarming’: Google removes some of its AI summaries after users’ health put at risk Guardian investigation finds AI Overviews provided inaccurate and false information when queried over blood tests

Health experts: Your synthetic text "AI" overviews are misleading, for example see this about liver function tests.
Google: Okay, we'll block "AI" overviews on that query.

The product is fundamentally flawed and cannot be "fixed" by patching query by query.

A short 🧵>>

11.01.2026 14:27 👍 580 🔁 215 💬 11 📌 9

Nice chat with some of my soon-to-submit PhD students. They all know how to conduct and write up research, have lots of ideas for future work, and have developed networks of collaborators. So they are ready to "leave the nest", which is good feeling for me as supervisor

08.01.2026 09:54 👍 1 🔁 0 💬 0 📌 0

Retirement Plans: Travel and some academics I hope to retire soon, and many people are asking about my plans. Basically I want to do lots of travel, say involved in academia, and perhaps do some writing.

New blog (personal): Retirement Plans: Travel and some academics

I hope to retire soon, and many people are asking about my plans. Basically I want to do lots of travel, say involved in academia, and perhaps do some writing.

ehudreiter.com/2026/01/06/r...

06.01.2026 08:24 👍 2 🔁 0 💬 0 📌 0

One nice thing about 2025 was that the two publications I was proudest of were single-author! Also many good papers with my students, but I get a special buzz from single-author papers

01.01.2026 13:46 👍 2 🔁 0 💬 0 📌 0

Do a sanity check on your experiments I strongly recommend that researchers do “sanity checks” on data, model outputs, and evaluation results, looking for anomalies. This can help detect data errors, model cheating, softwar…

New blog: Do a sanity check on your experiments

Researchers should do a “sanity” check on experiments. That is, manually inspect some (A) test/train data, (B) model/system output, and (C) evaluation output, looking for anything that seems strange.
ehudreiter.com/2025/12/22/d...

22.12.2025 09:05 👍 4 🔁 0 💬 0 📌 0

One of main goals for 2025-26 is to get 6 PhD students to submit before I retire in summer 2026. So very happy that Nikolay Babakov has submitted and passed his viva, and Iniakpokeikiye Thompson has submitted. Getting there...

16.12.2025 10:11 👍 1 🔁 0 💬 0 📌 0

Colleague has discovered many bugs (eg incorrect annotations) in a respected 8-year old dataset he is using. Nobody warned him, and hard for him to warn others. Maybe most people just dont care if dataset is deeply flawed, as long as they can compute numbers and beat SOTA...

15.12.2025 09:02 👍 1 🔁 0 💬 0 📌 0

Making good LLM benchmark is hard. Avoid
data contamination, reward hacking, saturation; ensure construct validity; rigorously test and validate, etc.

Unfortunately, community places little value on above. Want to beat SOTA or competitors, dont care if BM used mean anything...

10.12.2025 07:55 👍 3 🔁 0 💬 0 📌 0

Latest posts by @ehudreiter