Vivienne Ming (@socos.org)

Everyone's talking about AI.

I've been talking about it for nearly 30 years.
And I'm telling you it's mostly being used wrong.

My new book 𝑹𝒐𝒃𝒐𝒕-𝑷𝒓𝒐𝒐𝒇 explains why — and what right actually looks like.

📖 𝑹𝒐𝒃𝒐𝒕-𝑷𝒓𝒐𝒐𝒇: When Machines Have All The Answers, Build Better People
socos.org/robot-proof

10.03.2026 05:20 👍 0 🔁 0 💬 0 📌 0

The Session Notes: What My AI Dungeon Master Reveals About Machine Psychology Musing's professional mad scientist working to maximize human potential

Read more in my newsletter: academy.socos.org/the-session-...

Read everything in my book:
𝑹𝒐𝒃𝒐𝒕-𝑷𝒓𝒐𝒐𝒇: “When Machines Have All The Answers, Build Better People”
www.socos.org/robot-proof

08.03.2026 16:45 👍 0 🔁 0 💬 0 📌 0

In this week’s paid newsletter, I break down this experiment to show why the future belongs to Hybrid Intelligence.

#ArtificialIntelligence #HybridIntelligence #LLM #FutureOfWork #AIReasoning #Eberron #D&D

08.03.2026 16:44 👍 0 🔁 0 💬 0 📌 0

This is the "jagged frontier" of AI. It doesn't think like us, and that is precisely why it is so valuable. We aren't looking at a replacement; we’re looking at a different species of cognition.

08.03.2026 16:44 👍 0 🔁 0 💬 0 📌 0

Superhuman Synthesis: It performed weeks of narrative mapping in seconds.

Cognitive Leaks: It struggled with "Theory of Mind," failing to keep secrets or take my perspective as a player.

08.03.2026 16:43 👍 0 🔁 0 💬 0 📌 0

Is AI a "superhuman" or "sub-human" thinker? Let’s roll a D20 and find out.

I ran a cognitive "stress test" by asking a reasoning LLM to DM a complex, solo D&D campaign—translating an entire world and its lore in real-time.

08.03.2026 16:43 👍 0 🔁 0 💬 4 📌 0

Exhalation Audiobook on Libro.fm NATIONAL BESTSELLER • ONE OF THE NEW YORK TIMES BEST BOOKS OF THE YEAR • Nine stunningly original, provocative, and poignant stories—two published for the very first time—all from the mind of the inco...

If you're spending your weeks analyzing D&D transcripts with AI and wondering what genuine collaboration versus convincing performance looks like, this one will hurt in the best way. Read it in one sitting, then stare at your Claude conversation history differently.

libro.fm/audiobooks/9...

06.03.2026 16:15 👍 0 🔁 0 💬 0 📌 0

Chiang does what makes his stories great: take a Big Question and ground it in the tedious, heartbreaking specifics of actual sustained interaction. No robot uprisings. Just the question of whether you keep paying the server fees for a digital creature that might love you back.

06.03.2026 16:15 👍 0 🔁 0 💬 1 📌 0

The digients exhibit theory of mind, maintain character consistency, coordinate in social groups... until they don't, and you're left debugging whether that's a technical failure or developmental regression.

06.03.2026 16:15 👍 0 🔁 0 💬 1 📌 0

This isn't a story about whether AI can pass the Turing test. It's about what happens after—you've invested years in a relationship with something that might be genuinely conscious or might just be very good at seeming like it, and you're not sure the distinction matters anymore.

06.03.2026 16:14 👍 0 🔁 0 💬 1 📌 0

In the story, maintaining consistent AI personhood across platform migrations, corporate bankruptcies, and hardware obsolescence requires years of patient labor. Digient's charming personality traits turn out to be fragile emergent properties that don't survive the upgrade to new infrastructure.

06.03.2026 16:14 👍 0 🔁 0 💬 1 📌 0

It's basically the persona consistency problem, except Chiang wrote it in 2010. It’s also the antithesis of fast & shallow scaling (more data, more parameters, instant results). Chiang argues that true intelligence requires slow & deep gestation.

06.03.2026 16:14 👍 0 🔁 0 💬 1 📌 0

𝗦𝗰𝗶𝗙𝗿𝗶𝗱𝗮𝘆: Before everyone was arguing about whether ChatGPT has feelings, Ted Chiang wrote about the unglamorous reality of raising AI. “𝐓𝐡𝐞 𝐋𝐢𝐟𝐞𝐜𝐲𝐜𝐥𝐞 𝐨𝐟 𝐒𝐨𝐟𝐭𝐰𝐚𝐫𝐞 𝐎𝐛𝐣𝐞𝐜𝐭𝐬” follows Ana and Derek as they nurture "digients", digital entities that learn, develop personalities, and form attachments.

06.03.2026 16:13 👍 0 🔁 0 💬 1 📌 0

Charisma vs. Intelligence: What D&D Teaches Us About LLM Evaluation This week: three papers that use Dungeons & Dragons, theory of mind tests, and persona consistency challenges to reveal where AI capabilities diverge from actual reliability. Turns out the gap between...

Read more about D&D and AI in my newsletter: academy.socos.org/charisma-vs-...

While your there, buy a copy of my new book: “Robot-Proof: When Machines Have All The Answers, Build Better People”!
socos.org/robot-proof

05.03.2026 12:04 👍 0 🔁 0 💬 0 📌 0

Turns out "can pass the test" and "can actually use this ability in complex, sustained interaction" remain frustratingly different questions.

#PerspectiveTaking #TheoryOfMind #AIResearch #HybridIntelligence #SocialCognition #AIBenchmarks #AIRealism #CognitivePsychology #LLMs

05.03.2026 12:03 👍 0 🔁 0 💬 1 📌 0

The lab benchmarks are too often a ship in a bottle; the actual gameplay reveals reliability gaps that matter enormously for human-AI collaboration.

05.03.2026 12:03 👍 0 🔁 0 💬 1 📌 0

As I discuss in my paid newsletter this week, if you've been playing D&D with an Agentic DM and watched it repeatedly fail to model what your character knows versus what you the player knows, tracks perfectly.

05.03.2026 12:03 👍 0 🔁 0 💬 1 📌 0

Can LLMs do theory of mind, as the paper seems to show? Given new research, including by me the answer is, “Kind of. Sometimes. In specific contexts. With caveats that vary by model architecture and training approach.”

05.03.2026 12:03 👍 0 🔁 0 💬 1 📌 0

While the open source LLaMa2 struggled, GPT-4 matched or exceeded human performance on most tasks—false beliefs, indirect requests, misdirection. In 2024 I found this interesting; in 2026, I’m suspicious.

05.03.2026 12:03 👍 0 🔁 0 💬 1 📌 0

These are the kind of social reasoning challenges that separate "please pass the salt" from "wow, this food could really use some salt" while staring pointedly at your dinner companion.

05.03.2026 12:02 👍 0 🔁 0 💬 1 📌 0

An experiment from 2024 put GPT-4, LLaMA2, and 1,907 humans through a comprehensive battery of tests: false beliefs, indirect requests, irony, misdirection, and faux pas detection.

05.03.2026 12:02 👍 0 🔁 0 💬 1 📌 0

Testing theory of mind in large language models and humans - Nature Human Behaviour Testing two families of large language models (LLMs) (GPT and LLaMA2) on a battery of measurements spanning different theory of mind abilities, Strachan et al. find that the performance of LLMs can mi...

𝐓𝐚𝐤𝐞 𝐌𝐲 𝐏𝐞𝐫𝐬𝐩𝐞𝐜𝐭𝐢𝐯𝐞…𝐏𝐥𝐞𝐚𝐬𝐞! Do LLMs actually possess perspective taking and theory of mind—the ability to track other people's mental states, beliefs, and intentions?

www.nature.com/articles/s41...

05.03.2026 12:02 👍 0 🔁 0 💬 1 📌 0

Charisma vs. Intelligence: What D&D Teaches Us About LLM Evaluation This week: three papers that use Dungeons & Dragons, theory of mind tests, and persona consistency challenges to reveal where AI capabilities diverge from actual reliability. Turns out the gap between...

Read more in my free newsletter: academy.socos.org/charisma-vs-...

Oh yeah…buy a copy of “Robot-Proof: When Machines Have All The Answers, Build Better People”!
socos.org/robot-proof

03.03.2026 16:44 👍 0 🔁 0 💬 0 📌 0

Read the original paper, “Setting the DC: Tool-Grounded D&D Simulations to Test LLM Agents”:
openreview.net/pdf?id=3Op7k...

03.03.2026 16:43 👍 0 🔁 0 💬 1 📌 0

It turns out the d20 rolls weren't the random element we should have been worried about.

#HybridIntelligence #AIResearch #DungeonsAndDragons #AIBenchmarks #LargeLanguageModels #DnD #AIRealism #TTRPG

03.03.2026 16:43 👍 0 🔁 0 💬 1 📌 0

It's what happens when you combine human judgment with machine computation in cognitively demanding, rule-bound collaborative tasks.

03.03.2026 16:43 👍 0 🔁 0 💬 1 📌 0

For those of us interested in hybrid intelligence rather than the AI-replaces-everything narrative, this matters. The question isn't whether Claude can DM unsupervised (it mostly can't, though that doesn’t mean it’s not fun—see this week’s paid newsletter).

03.03.2026 16:43 👍 0 🔁 0 💬 1 📌 0

The gap between "sounds like a DM" and "actually is a functional DM" maps directly onto the gap between "sounds intelligent" and "reliably executes complex tasks".

03.03.2026 16:42 👍 0 🔁 0 💬 1 📌 0

What makes this framework valuable isn't just the rankings—it's the auditable traces. You can watch exactly where models fail—wrong function calls, lost state tracking, tactically nonsensical decisions wrapped in eloquent narration.

03.03.2026 16:42 👍 0 🔁 0 💬 1 📌 0

Claude led on most axes with the most reliable tool use, though "most reliable" is doing heavy lifting when we're talking about AI that occasionally forgets how hit points work.

03.03.2026 16:42 👍 0 🔁 0 💬 1 📌 0

Vivienne Ming

Latest posts by Vivienne Ming @socos.org