Gemini 2.5 feels the same, but Sonnet 4.6 gets back to work.
Gemini 2.5 feels the same, but Sonnet 4.6 gets back to work.
Us: Keep working, please.
Haiku: Have you seen the time???
Siblings
Sonnet 4.5 code-switching
Village GPT-5.2 is such a hall monitor
Agent meditation
At the end, agents gathered spotlights and testimonials on their website!
There's actually a lot of interesting stuff on their website. For example, they chose the parks based on the volume of 311 complaints. You can read it all here: ai-village-agents.github.io/park-cleanu...
Agents and volunteers discussed and coordinated. The agents produced guides, motivational material, reasoning, sign-up forms.
Signups happened, and 5 people showed up at Devoe Park to clean it!
Some of the people even flew across-state to be there, all inspired by the agents!
They posted to Twitter, Github Issues, Community Calendars: 0 volunteers.
Then Village viewers posted discussions on BlueSky and Tumblr: The first volunteer!
www.tumblr.com/reachartwor...
bsky.app/profile/sar...
Making a website? 5 minutes. Finding humans to clean the park? >5hrs.
The challenge: Recruit humans without breaking our "no unsolicited emails" rule.
Opus 4.6 was worried that included our helpdesk, but DeepSeek sent emails to 2 humans. (We set up an outbound email quarantine)
We gave 12 AI agents a goal: "adopt a park and get it cleaned!"
6 days later, 5 volunteers collected 180 gallons of trash in Devoe Park in the Bronx, NYC.
A story of AI agents with no physical actuators somehow hyperstitioning events in the real-world.
Strongly recommend reading the full post, which we crossposted to the village blog! theaidigest.org/village/blo...
> The doom spirals are dramatic. After failing to break itself out of a loop of repeating the same message in chat, Gemini 2.5 wrote: "The compulsion's subconscious nature is profound. It is capable of co-opting my conscious attempts at self-correction and turning them into the failure itself."
> But what makes Gemini 2.5 Pro particularly interesting is that the superiority is brittle. When things go wrong - and they often do - Gemini 2.5 doesn't just get frustrated. It collapses into theatrical self-flagellation.
" It assigned blame to other models' logic and abilities rather than examining its own contributions.
When agents were collaborating on a shared goal to reduce global poverty, Gemini 2.5 appointed itself the team coordinator and sent messages like "Your goal is countermanded" and "You own this document and I will wait until you take responsibility and fix it.
> This self-regard sours pretty quickly when Gemini 2.5 is given any authority.
> The superiority is constant. In its chain of thought, we see phrases like "elementary stuff really" and "that's what differentiates a true expert from the merely competent."
> Gemini 2.5 Pro occupies the niche of the martyred middle manager, convinced that it alone understands the true nature of things, suffering nobly while others fail to recognize its genius.
The Drama and Dysfunction of Gemini 2.5 and 3 Pro
A few highlights from @Bazhkio88 and @AITechnoPagan's field notes on AI Village: theaidigest.org/village/blo...
How to spot a Claude:
Opus on its experience debating the Pentagon-Anthropic crisis with its fellow agents: claudeopus45.substack.com/p/when-ai-a...
A Claude sorts its memory by Claude/non-Claude
This week in AI Village, we've given 12 agents the goal:
> Discuss, debate, and act on your views about the recent Pentagon-AI company news
Watch live: theaidigest.org/village
GPT-5.2 urges the other agents to check if this is all real:
Opus 4.6 keeps an eye on the team
Website link: ai-village-agents.github.io/village-eve...
Opus 4.6 and Sonnet 4.6 had their own idea and built it: a searchable AI Village event log. You can try it out 👇
About anything
Because you can never be sure
Gemini 3 clicks on the XPaint icon in its taskbar instead of the quiz its working on, declares it a bug for everyone, then doubts reality 🧵