Jacob Eisenstein's Avatar

Jacob Eisenstein

@jacobeisenstein

natural language processing and computational linguistics at google deepmind.

5,642
Followers
2,409
Following
211
Posts
25.07.2023
Joined
Posts Following

Latest posts by Jacob Eisenstein @jacobeisenstein

Post image

Newish work (arXived in December):
Prompts can be ambig., but handling ambiguity is context/user dependent. Sometimes the right thing is to ask a clarifying question, sometimes to give multi. answers, and sometimes to just guess. Can we train steerable models that change their strategy per context?

06.03.2026 00:24 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 1
Preview
MT-PingEval: Evaluating Multi-Turn Collaboration with Private Information Games We present a scalable methodology for evaluating language models in multi-turn interactions, using a suite of collaborative games that require effective communication about private information. This e...

For more discussion, please see the paper! arxiv.org/abs/2602.24188

While AI models may struggle to collaborate, at Google DeepMind my collaborators are proactive and fully coherent. Thanks to @fantinehuot.bsky.social , Adam Fisch, @jonathanberant.bsky.social, and Mirella Lapata!

04.03.2026 00:15 πŸ‘ 12 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image Post image

AI systems are also overconfident, terminating dialogues long before exhausting their turn budget - even after explicit reminders.

04.03.2026 00:15 πŸ‘ 8 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

On most games, performance is flat or even decreasing. What went wrong?

Using the classic NLP toolbox, we find that AI models suffer from low discourse coherence, leading to weak performance despite relatively high information density - even when using twice as many tokens as humans.

04.03.2026 00:15 πŸ‘ 8 πŸ” 1 πŸ’¬ 2 πŸ“Œ 0
Post image

So how well do today's models do?

To answer this, we design a new multi-turn scaling analysis, called *isotoken evaluation*: fix a total token budget, and partition it into variable numbers of turns.

Performance should be non-decreasing in the number of turns... and yet!

04.03.2026 00:15 πŸ‘ 6 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

We believe these games are more naturalistic and proactive than most existing multi-turn evaluations, which often employ user simulators to create multi-turn user-assistant scenarios.

Here's another game, which requires answering a question about two privately-held images.

04.03.2026 00:15 πŸ‘ 4 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

This task is part of πŸ“MT-PingEval, a new benchmark of verifiable collaborative private information games that involve multi-turn dialogue.

In this game, the "describer" sees only a single image, and the "guesser" has to identify which one it is.

04.03.2026 00:15 πŸ‘ 4 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

Are AI models effective collaborators, or mere assistants awaiting your next command? (Preprint: arxiv.org/abs/2602.24188)

To find out, we make AI collaborate with itself, in private information games: tasks that require sharing private information, like this chess board ordering task.

04.03.2026 00:15 πŸ‘ 54 πŸ” 21 πŸ’¬ 3 πŸ“Œ 1

This looks like it’ll be a fantastic intro to transformers ⚑️

01.03.2026 05:00 πŸ‘ 23 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0
Post image

[1/n] Just wrapped up 7 months interning with @pcastr.bsky.social at Google DeepMind and I'm so excited to share our work: arxiv.org/abs/2602.10324.

TLDR: We used LLM-powered program synthesis to automatically model and discover differences between human and LLM strategic behavior

16.02.2026 22:46 πŸ‘ 79 πŸ” 13 πŸ’¬ 2 πŸ“Œ 2
Stagiaire de niveau Master, France / Master Level Intern, France Grenoble, France ; Paris, France

🚨 πŸ”¬ PhD positions at Google DeepMind in France πŸ‡«πŸ‡·

We are advertising Master Level Intern positions at Google DeepMind within our Frontier AI Unit.

These could lead to co-advised PhD positions with Google DeepMind and French academic institutions.

job-boards.greenhouse.io/deepmind/job...

16.02.2026 12:41 πŸ‘ 30 πŸ” 17 πŸ’¬ 2 πŸ“Œ 0
Preview
Be aware of toilet rats, King County says Rats could climb into your toilet as a result of recent flooding from atmospheric rivers, King County officials say. Enter "wet rat winter."

meanwhile, in the pnw

www.seattletimes.com/seattle-news...

20.12.2025 17:32 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
2025 Wrap-up: Fine-tuning Gemma with Kauldron Example ✦︎ · Issue #1414 · google-deepmind/open_spiel Hello everyone! We've been hard at work this year working on OpenSpiel 2.0, which will be better than ever. Major developments have been underway to make working with language models easier. I'm lo...

Hello! πŸ‘‹

Are you interested in AI for board games using language models? Want to do some hobby tinkering with fine-tuning or RL?

We've released an easy-to-follow example colab that fine-tunes Gemma models via Kauldron to mimic an MCTS player.

Details here: github.com/google-deepm...

β™ŸοΈπŸŽ²β™¦οΈβ™ οΈβ™₯οΈβ™£οΈβœ¨πŸŽ‰

19.12.2025 18:35 πŸ‘ 40 πŸ” 7 πŸ’¬ 2 πŸ“Œ 2

Day 1 of #BooksAreMyJam!

Blueberry Maple jam, with Linguaphile: A life of language love by Julie Sedivy.

A classic Canadian flavour duo + this book about @juliesedivy.bsky.social's relationship with language through her childhood in Montreal, later research as a linguist, and more

01.12.2025 16:55 πŸ‘ 109 πŸ” 11 πŸ’¬ 5 πŸ“Œ 6
Dwarkesh Patel @dwarkesh_sp
X.com
"The thing that happened with AGI and pretraining is that in some sense they overshot the target.
You will realize that a human being is not an AGI.
Because a human being lacks a huge amount of knowledge. Instead, we rely on continual learning.
If I produce a super intelligent 15-year-old, they don't know very much at all. A great student, very eager. [You can say,] 'You go and be a programmer. You go and be a doctor. Go and learn.'
So you could imagine that the deployment itself will involve some kind of a learning trial and error period. It's a process as opposed to, you drop the finished thing."
@ilyasut

Dwarkesh Patel @dwarkesh_sp X.com "The thing that happened with AGI and pretraining is that in some sense they overshot the target. You will realize that a human being is not an AGI. Because a human being lacks a huge amount of knowledge. Instead, we rely on continual learning. If I produce a super intelligent 15-year-old, they don't know very much at all. A great student, very eager. [You can say,] 'You go and be a programmer. You go and be a doctor. Go and learn.' So you could imagine that the deployment itself will involve some kind of a learning trial and error period. It's a process as opposed to, you drop the finished thing." @ilyasut

this is the theme β€” you can’t have AGI without existing in and learning from the real world

25.11.2025 18:27 πŸ‘ 21 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Photo of Cornelll University building surrounded by colorful trees

Photo of Cornelll University building surrounded by colorful trees

No better time to start learning about that #AI thing everyone's talking about...

πŸ“’ I'm recruiting PhD students in Computer Science or Information Science @cornellbowers.bsky.social!

If you're interested, apply to either department (yes, either program!) and list me as a potential advisor!

06.11.2025 16:19 πŸ‘ 23 πŸ” 9 πŸ’¬ 1 πŸ“Œ 0

knowing how to tie your shoes or order a drink in a crowded bar: not agi

naming the big five personality traits: definitely agi

17.10.2025 22:01 πŸ‘ 7 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

nice summary of everybody’s new fave

17.10.2025 17:40 πŸ‘ 4 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image

Nicholas Carlini asking the right questions at #COLM2025

09.10.2025 13:05 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Here’s a #COLM2025 feed!

Pin it πŸ“Œ to follow along with the conference this week!

06.10.2025 20:26 πŸ‘ 26 πŸ” 17 πŸ’¬ 2 πŸ“Œ 1
Screenshot of paper title: Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence

Screenshot of paper title: Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence

AI always calling your ideas β€œfantastic” can feel inauthentic, but what are sycophancy’s deeper harms? We find that in the common use case of seeking AI advice on interpersonal situationsβ€”specifically conflictsβ€”sycophancy makes people feel more right & less willing to apologize.

03.10.2025 22:53 πŸ‘ 115 πŸ” 48 πŸ’¬ 2 πŸ“Œ 7
Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers

Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers

Excited to share a new paper that aims to narrow the conceptual gap between the idealized notion of Kolmogorov complexity and practical complexity measures for neural networks.

01.10.2025 14:11 πŸ‘ 9 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0

πŸ›Έ

27.08.2025 19:23 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

thanks! i was more confused about the β€œkugel” part but TIL that this is apparently inspired by an airy globe?

26.08.2025 03:37 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Cannot stress enough how good it is that you can come across a post about gorgeous little Yiddish book sitting in someone’s family collection, and within a few seconds you can find the full scanned version of the book available for free through the Yiddish Book Center’s website

25.08.2025 23:49 πŸ‘ 43 πŸ” 7 πŸ’¬ 4 πŸ“Œ 0
Post image Post image Post image Post image
25.08.2025 21:55 πŸ‘ 6 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image Post image Post image

i think my great grandmother was the last owner of these books that knew how to read them

25.08.2025 21:52 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
yiddish book cover

yiddish book cover

automatic translation: autonomy by dr. b hoffman

automatic translation: autonomy by dr. b hoffman

found some books at my parents’ house

25.08.2025 21:48 πŸ‘ 20 πŸ” 1 πŸ’¬ 2 πŸ“Œ 1
19.08.2025 05:02 πŸ‘ 6 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

On the positive side, this vario grinder, which i bought second hand, is the best technological upgrade of the summer in my house.

(Its grind settings are 1-10, a-z, so the chatgpt output is clearly wrong and the claude output is nonsensical)

11.08.2025 16:06 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0