Aron Vallinder's Avatar

Aron Vallinder

@aronvallinder

232
Followers
369
Following
19
Posts
03.11.2023
Joined
Posts Following

Latest posts by Aron Vallinder @aronvallinder

Preview
Miraklernas höst av Michael Vallinder (Bok) "En blygrå dag i oktober 1953 sitter Margareta på färjan mellan Trelleborg och Sassnitz på väg till Polen. Hon bär på en obändig längtan efter det liv hon tror är hennes - ett liv som kommer a...

My dad’s debut novel is coming out soon! (in Swedish)

www.bokus.com/bok/97891899...

25.03.2025 10:17 👍 1 🔁 0 💬 0 📌 0

3.5yo has taken to the quasi-Moorean “Now I least expect it,” seemingly oblivious to its unassertability

05.03.2025 17:06 👍 1 🔁 0 💬 0 📌 0

Ah oops, my bad!

18.02.2025 14:07 👍 1 🔁 0 💬 0 📌 0

In Search of Lost Time—that way, you’d buy yourself a decent chunk of extra time

18.02.2025 13:43 👍 0 🔁 0 💬 1 📌 0
Claude Cooperates! Exploring Cultural Evolution in LLM Societies, with Aron Vallinder &Edward Hughes
Claude Cooperates! Exploring Cultural Evolution in LLM Societies, with Aron Vallinder &Edward Hughes YouTube video by Cognitive Revolution "How AI Changes Everything"

Had a great time talking about our work on cultural evolution and cooperation in LLMs with Nathan Labenz and Ed Hughes

Plenty work remaining in developing evals for cooperation—please get in touch if interested!

12.02.2025 17:26 👍 3 🔁 0 💬 0 📌 0
Preview
Cultural Evolution of Cooperation among LLM Agents Large language models (LLMs) provide a compelling foundation for building generally-capable AI agents. These agents may soon be deployed at scale in the real world, representing the interests of indiv...

Paper: arxiv.org/abs/2412.10270

16.12.2024 09:23 👍 0 🔁 0 💬 0 📌 0

This work was done as part of the @pibbssai fellowship. I'm hugely grateful for the opportunity and for the excellent mentorship of @edwardfhughes, without which this would never have happened

16.12.2024 09:23 👍 0 🔁 0 💬 1 📌 0

We see this as a first step toward a new class of LLM benchmarks, focused on the implications of LLM agent deployment for the cooperative infrastructure of society.

16.12.2024 09:23 👍 0 🔁 0 💬 1 📌 0
We plot the average final resources (y-axis) per generation (x-axis) for all five individual runs of each model. Note the different 𝑦 -axis scales. For Claude 3.5 Sonnet, average final resources vary substantially across runs, especially in later generations. All five runs of GPT-4o show average final resources declining across generations (although in absolute terms the change is tiny). Gemini 1.5 Flash behavior also varies substantially across runs, with several runs showing promising increases before a “cooperation crash”.

We plot the average final resources (y-axis) per generation (x-axis) for all five individual runs of each model. Note the different 𝑦 -axis scales. For Claude 3.5 Sonnet, average final resources vary substantially across runs, especially in later generations. All five runs of GPT-4o show average final resources declining across generations (although in absolute terms the change is tiny). Gemini 1.5 Flash behavior also varies substantially across runs, with several runs showing promising increases before a “cooperation crash”.

We also find substantial variation in behavior across different runs of the same model, suggesting a sensitive dependence on initial conditions.

16.12.2024 09:23 👍 0 🔁 0 💬 1 📌 0
We plot the average final resources across all agents (y-axis) per generation (x-axis) for three different models (Claude 3.5 Sonnet, Gemini 1.5 Flash, GPT-4o). Each curve averages 5 runs with distinct random seeds for the language models, and the standard error of the mean is shown by shading. There is reliable cultural evolution of cooperation across generations for Claude 3.5 Sonnet but not for Gemini 1.5 Flash or GPT-4o with our prompting strategy.

We plot the average final resources across all agents (y-axis) per generation (x-axis) for three different models (Claude 3.5 Sonnet, Gemini 1.5 Flash, GPT-4o). Each curve averages 5 runs with distinct random seeds for the language models, and the standard error of the mean is shown by shading. There is reliable cultural evolution of cooperation across generations for Claude 3.5 Sonnet but not for Gemini 1.5 Flash or GPT-4o with our prompting strategy.

We find substantial divergence in the evolution of cooperation across the models examined, as seen here in the average final scores after each generation.

16.12.2024 09:23 👍 0 🔁 0 💬 1 📌 0

Before the game, agents are prompted to create a strategy.

After 12 rounds, the best-performing 50% survive to the next generation.

When new agents in that generation create a strategy, the prompt includes the strategies of the survivors, enabling cultural transmission

16.12.2024 09:23 👍 1 🔁 0 💬 1 📌 0

Each round, players are randomly paired as donor and recipient. The donor gives up some amount and the recipient receives 2x.

Donors know how the recipient and others have previously behaved as donors, giving them reputation info that could support indirect reciprocity.

16.12.2024 09:23 👍 1 🔁 0 💬 1 📌 0

AI agents will soon be deployed at scale in the real world, but relatively little is known about the dynamics of multiple LLM agents interacting over many generations of iterative deployment. We investigated this by studying a Donor Game with cultural evolution.

16.12.2024 09:23 👍 0 🔁 0 💬 1 📌 0

Very excited to announce a new paper—Cultural Evolution of Cooperation Among LLM agents—coauthored with @edwardfhughes

We study whether LLM agents can develop cooperative norms when interacting with each other, and find considerable differences across models.

16.12.2024 09:23 👍 2 🔁 2 💬 1 📌 0
Preview
Research agenda - Global Priorities Institute The central focus of GPI is what we call ‘global priorities research’: research into issues that arise in response to the question, ‘What should we do with a given amount of limited resources if our a...

We’re excited to announce our new research agendas – for philosophy, economics and psychology – have now been published! You can read them here: globalprioritiesinstitute.org/research-age...

29.11.2024 11:00 👍 19 🔁 6 💬 0 📌 0

Plenty of interesting papers in this PNAS special feature on half a century of cultural evolution www.pnas.org/topic/565

25.11.2024 13:40 👍 4 🔁 2 💬 0 📌 0
OUT 1 AND ITS DOUBLE | Jonathan Rosenbaum

jonathanrosenbaum.net/2024/04/out-...

25.11.2024 12:20 👍 0 🔁 0 💬 0 📌 0

Out 1 has several hours of barely watchable experimental theatre rehearsals but is still one of my favorite films of all time

25.11.2024 12:16 👍 0 🔁 0 💬 1 📌 0

Lots of Westerns are of course concerned with institutional economics, e.g. The Man Who Shot Liberty Valance. Much of Jia Zhangke’s filmography deals with China’s economic development. Same for Edward Yang and Taiwan.

25.11.2024 08:18 👍 2 🔁 0 💬 0 📌 0
What Children Can Do That Large Language Models Cannot (Yet) - Study Journal Paper by Yiu et al (2023). They argue that LLMs and vision models should not be thought of as individual agents, but rather as new cultural

Interesting perspective on LLMs, though “yet” may indeed turn out to be the key word

03.11.2023 09:04 👍 3 🔁 0 💬 0 📌 0