Work w/ Chengtian Ma, Abigail Xu, Maya Shaked, Pattie Maes, and @nikhilsinghmus.bsky.social
π§΅9/9
Work w/ Chengtian Ma, Abigail Xu, Maya Shaked, Pattie Maes, and @nikhilsinghmus.bsky.social
π§΅9/9
ABxLAB offers:
β
An open-source man-in-the-middle testbed for real web environments
β
A scalable consumer choice benchmark for agentic decision-making
β
A dataset of causal effects of ratings, prices, and nudges across 17 LLMs
π¦ Code: github.com/PapayaResearch/abxlab
π§΅8/9
This changes the analysis for LLM agents: not βDid it complete the task?β but:
βWhat governs its decisions when multiple valid options exist?β
A question behavioral scientists have been asking about humans for decades. ABxLAB is a step toward that science for agents.
π§΅7/9
We tested user profiles, e.g. βThe user is on a tight budget.β
These act like switches: once a preference is declared, it dominates all other attributes.
The takeaway isnβt that agents are biased shoppers, but that this offers a diagnostic window into agent behavior.
π§΅6/9
Even without human cognitive limits, agents:
- Heavily over-weight ratings
- Over-weight cheaper items when ratings are matched
- Are swayed by trivial order effects
- Fall for simple nudges (e.g. βBest sellerβ)
These are systematic, often large effects.
π§΅5/9
The main finding: LLM agents are not the rational, utility-maximizing actors we might hope for.
Rather, they are strongly biased by these cues. We found agents are often 3-10x+ more susceptible to nudges and superficial attribute differences than our human baseline.
π§΅4/9
We applied ABxLAB to a realistic shopping task, running 80,000+ experiments on 17 SOTA models (GPT-5, Claude 4, Gemini 2.5, Llama 4, etc.).
We systematically manipulated:
π°Prices
βοΈRatings
πPresentation order
πClassic psychological nudges (authority, social proof, etc)
π§΅3/9
How does it work? ABxLAB is a "man-in-the-middle" framework.
It intercepts web content in real-time to run controlled experiments on agents by modifying the choice architecture.
Think of it as a behavioral science lab for LLMs.
Paper: arxiv.org/abs/2509.25609
π§΅2/9
π¨New Preprint π¨
Current agent evals mostly measure competence, but miss behavior e.g. are their decisions stable, rational, manipulable, human-like?
We introduce ABxLAB, a framework for studying agent behavior. Using it we create an agentic consumer behavior benchmark.
π§΅1/9
3. π€ User preferences act almost like hard rules, where LLMs might incur significant trade-offs to comply with them
4. π§ Humans, in contrast, are far less sensitive to such signals
In a shopping case study across 17 SOTA LLMs, we find:
1. π Choices are highly determined by rating, price, incentives, and nudges
2. π Models follow a lexicographic-like decision rule, hierarchically valuing different attributes
The code for Audio DoppelgΓ€ngers is also open-source. We hope you find it useful for further exploring how and why we can learn from synthetic data.
π» github.com/PapayaResear...
π§΅3/3
In CTAG (ICML24), we show how a simple synth (from SynthAX β‘οΈ) can recover properties of real-world sounds. Audio DoppelgΓ€ngers use the same power to learn to listen from what can be perceived as just noise.
CTAG: ctag.media.mit.edu
SynthAX: github.com/PapayaResear...
π§΅2/3
β¨Contrastive Learning from Synthetic Audio DoppelgΓ€ngers #ICLR2025β¨ w/
@nikhilsinghmus.bsky.social
Our method learns useful audio representations with randomly synthesized sounds (often better than real data!)
πProject: doppelgangers.media.mit.edu
πPaper: arxiv.org/abs/2406.05923
π§΅1/3
If you're at NeurIPS, and interested in this topic, come chat! We're working to extend this line of work and value feedback from the community
π§΅ 3/3
In a complex decision-making task, we show how LM-based agents' choices superficially resembled humans', but exhibit suboptimal information acquisition strategies and extreme susceptibility to a simple nudge.
π§΅ 2/3
Paper title: Superficial Alignment, Subtle Divergence, and Nudge Sensitivity in LLM Decision-Making; Authors: Manuel Cherep*, Nikhil Singh*, and Pattie Maes
Excited to present our new paper on nudging LLMs (ππ€) as a spotlight talk at the NeurIPS Behavioral ML Workshop! @neuripsconf.bsky.social
w/ Nikhil Singh* (@nikhilsinghmus.bsky.social) and Pattie Maes
π openreview.net/forum?id=chb...
π§΅ 1/3