My dad’s debut novel is coming out soon! (in Swedish)
www.bokus.com/bok/97891899...
My dad’s debut novel is coming out soon! (in Swedish)
www.bokus.com/bok/97891899...
3.5yo has taken to the quasi-Moorean “Now I least expect it,” seemingly oblivious to its unassertability
Ah oops, my bad!
In Search of Lost Time—that way, you’d buy yourself a decent chunk of extra time
Had a great time talking about our work on cultural evolution and cooperation in LLMs with Nathan Labenz and Ed Hughes
Plenty work remaining in developing evals for cooperation—please get in touch if interested!
This work was done as part of the @pibbssai fellowship. I'm hugely grateful for the opportunity and for the excellent mentorship of @edwardfhughes, without which this would never have happened
We see this as a first step toward a new class of LLM benchmarks, focused on the implications of LLM agent deployment for the cooperative infrastructure of society.
We plot the average final resources (y-axis) per generation (x-axis) for all five individual runs of each model. Note the different 𝑦 -axis scales. For Claude 3.5 Sonnet, average final resources vary substantially across runs, especially in later generations. All five runs of GPT-4o show average final resources declining across generations (although in absolute terms the change is tiny). Gemini 1.5 Flash behavior also varies substantially across runs, with several runs showing promising increases before a “cooperation crash”.
We also find substantial variation in behavior across different runs of the same model, suggesting a sensitive dependence on initial conditions.
We plot the average final resources across all agents (y-axis) per generation (x-axis) for three different models (Claude 3.5 Sonnet, Gemini 1.5 Flash, GPT-4o). Each curve averages 5 runs with distinct random seeds for the language models, and the standard error of the mean is shown by shading. There is reliable cultural evolution of cooperation across generations for Claude 3.5 Sonnet but not for Gemini 1.5 Flash or GPT-4o with our prompting strategy.
We find substantial divergence in the evolution of cooperation across the models examined, as seen here in the average final scores after each generation.
Before the game, agents are prompted to create a strategy.
After 12 rounds, the best-performing 50% survive to the next generation.
When new agents in that generation create a strategy, the prompt includes the strategies of the survivors, enabling cultural transmission
Each round, players are randomly paired as donor and recipient. The donor gives up some amount and the recipient receives 2x.
Donors know how the recipient and others have previously behaved as donors, giving them reputation info that could support indirect reciprocity.
AI agents will soon be deployed at scale in the real world, but relatively little is known about the dynamics of multiple LLM agents interacting over many generations of iterative deployment. We investigated this by studying a Donor Game with cultural evolution.
Very excited to announce a new paper—Cultural Evolution of Cooperation Among LLM agents—coauthored with @edwardfhughes
We study whether LLM agents can develop cooperative norms when interacting with each other, and find considerable differences across models.
We’re excited to announce our new research agendas – for philosophy, economics and psychology – have now been published! You can read them here: globalprioritiesinstitute.org/research-age...
Plenty of interesting papers in this PNAS special feature on half a century of cultural evolution www.pnas.org/topic/565
Out 1 has several hours of barely watchable experimental theatre rehearsals but is still one of my favorite films of all time
Lots of Westerns are of course concerned with institutional economics, e.g. The Man Who Shot Liberty Valance. Much of Jia Zhangke’s filmography deals with China’s economic development. Same for Edward Yang and Taiwan.