Manuel Cherep (@mcherep)

Work w/ Chengtian Ma, Abigail Xu, Maya Shaked, Pattie Maes, and @nikhilsinghmus.bsky.social

🧵9/9

23.10.2025 18:16 👍 0 🔁 0 💬 0 📌 0

GitHub - PapayaResearch/abxlab: A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments - PapayaResearch/abxlab

ABxLAB offers:

✅ An open-source man-in-the-middle testbed for real web environments
✅ A scalable consumer choice benchmark for agentic decision-making
✅ A dataset of causal effects of ratings, prices, and nudges across 17 LLMs

📦 Code: github.com/PapayaResearch/abxlab

🧵8/9

23.10.2025 18:16 👍 1 🔁 0 💬 1 📌 0

This changes the analysis for LLM agents: not “Did it complete the task?” but:

“What governs its decisions when multiple valid options exist?”

A question behavioral scientists have been asking about humans for decades. ABxLAB is a step toward that science for agents.

🧵7/9

23.10.2025 18:16 👍 0 🔁 0 💬 1 📌 0

We tested user profiles, e.g. “The user is on a tight budget.”

These act like switches: once a preference is declared, it dominates all other attributes.

The takeaway isn’t that agents are biased shoppers, but that this offers a diagnostic window into agent behavior.

🧵6/9

23.10.2025 18:16 👍 0 🔁 0 💬 1 📌 0

Even without human cognitive limits, agents:

- Heavily over-weight ratings
- Over-weight cheaper items when ratings are matched
- Are swayed by trivial order effects
- Fall for simple nudges (e.g. “Best seller”)

These are systematic, often large effects.

🧵5/9

23.10.2025 18:16 👍 0 🔁 0 💬 1 📌 0

The main finding: LLM agents are not the rational, utility-maximizing actors we might hope for.

Rather, they are strongly biased by these cues. We found agents are often 3-10x+ more susceptible to nudges and superficial attribute differences than our human baseline.

🧵4/9

23.10.2025 18:16 👍 0 🔁 0 💬 1 📌 0

We applied ABxLAB to a realistic shopping task, running 80,000+ experiments on 17 SOTA models (GPT-5, Claude 4, Gemini 2.5, Llama 4, etc.).

We systematically manipulated:
💰Prices
⭐️Ratings
🔀Presentation order
👉Classic psychological nudges (authority, social proof, etc)

🧵3/9

23.10.2025 18:16 👍 0 🔁 0 💬 1 📌 0

A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments Environments built for people are increasingly operated by a new class of economic actors: LLM-powered software agents making decisions on our behalf. These decisions range from our purchases to trave...

How does it work? ABxLAB is a "man-in-the-middle" framework.

It intercepts web content in real-time to run controlled experiments on agents by modifying the choice architecture.

Think of it as a behavioral science lab for LLMs.

Paper: arxiv.org/abs/2509.25609

🧵2/9

23.10.2025 18:16 👍 1 🔁 0 💬 1 📌 0

🚨New Preprint 🚨

Current agent evals mostly measure competence, but miss behavior e.g. are their decisions stable, rational, manipulable, human-like?

We introduce ABxLAB, a framework for studying agent behavior. Using it we create an agentic consumer behavior benchmark.

🧵1/9

23.10.2025 18:16 👍 1 🔁 1 💬 1 📌 1

3. 👤 User preferences act almost like hard rules, where LLMs might incur significant trade-offs to comply with them

4. 🧑 Humans, in contrast, are far less sensitive to such signals

02.10.2025 21:00 👍 0 🔁 0 💬 0 📌 0

In a shopping case study across 17 SOTA LLMs, we find:

1. 🛒 Choices are highly determined by rating, price, incentives, and nudges

2. 🔀 Models follow a lexicographic-like decision rule, hierarchically valuing different attributes

02.10.2025 21:00 👍 1 🔁 0 💬 1 📌 0

GitHub - PapayaResearch/doppelgangers: Contrastive Learning from Synthetic Audio Doppelgängers @ ICLR'25 Contrastive Learning from Synthetic Audio Doppelgängers @ ICLR'25 - PapayaResearch/doppelgangers

The code for Audio Doppelgängers is also open-source. We hope you find it useful for further exploring how and why we can learn from synthetic data.

💻 github.com/PapayaResear...

🧵3/3

12.03.2025 20:25 👍 0 🔁 0 💬 0 📌 0

In CTAG (ICML24), we show how a simple synth (from SynthAX ⚡️) can recover properties of real-world sounds. Audio Doppelgängers use the same power to learn to listen from what can be perceived as just noise.

CTAG: ctag.media.mit.edu
SynthAX: github.com/PapayaResear...

🧵2/3

12.03.2025 20:25 👍 0 🔁 0 💬 1 📌 0

✨Contrastive Learning from Synthetic Audio Doppelgängers #ICLR2025✨ w/
@nikhilsinghmus.bsky.social

Our method learns useful audio representations with randomly synthesized sounds (often better than real data!)

🌐Project: doppelgangers.media.mit.edu
📄Paper: arxiv.org/abs/2406.05923

🧵1/3

12.03.2025 20:25 👍 4 🔁 1 💬 1 📌 0

If you're at NeurIPS, and interested in this topic, come chat! We're working to extend this line of work and value feedback from the community

🧵 3/3

26.11.2024 23:07 👍 1 🔁 0 💬 0 📌 0

In a complex decision-making task, we show how LM-based agents' choices superficially resembled humans', but exhibit suboptimal information acquisition strategies and extreme susceptibility to a simple nudge.

🧵 2/3

26.11.2024 23:07 👍 1 🔁 0 💬 1 📌 0

Paper title: Superficial Alignment, Subtle Divergence, and Nudge Sensitivity in LLM Decision-Making; Authors: Manuel Cherep*, Nikhil Singh*, and Pattie Maes

Excited to present our new paper on nudging LLMs (👉🤖) as a spotlight talk at the NeurIPS Behavioral ML Workshop! @neuripsconf.bsky.social

w/ Nikhil Singh* (@nikhilsinghmus.bsky.social) and Pattie Maes

🔗 openreview.net/forum?id=chb...

🧵 1/3

26.11.2024 23:07 👍 5 🔁 2 💬 1 📌 0

Manuel Cherep

Latest posts by Manuel Cherep @mcherep