Manuel Cherep
banner
mcherep.bsky.social
Manuel Cherep
@mcherep.bsky.social
PhD student at MIT working on behavioral machine learning.
Work w/ Chengtian Ma, Abigail Xu, Maya Shaked, Pattie Maes, and @nikhilsinghmus.bsky.social

🧵9/9
October 23, 2025 at 6:16 PM
ABxLAB offers:

✅ An open-source man-in-the-middle testbed for real web environments
✅ A scalable consumer choice benchmark for agentic decision-making
✅ A dataset of causal effects of ratings, prices, and nudges across 17 LLMs

📦 Code: github.com/PapayaResearch/abxlab

🧵8/9
GitHub - PapayaResearch/abxlab: A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments - PapayaResearch/abxlab
github.com
October 23, 2025 at 6:16 PM
This changes the analysis for LLM agents: not “Did it complete the task?” but:

“What governs its decisions when multiple valid options exist?”

A question behavioral scientists have been asking about humans for decades. ABxLAB is a step toward that science for agents.

🧵7/9
October 23, 2025 at 6:16 PM
We tested user profiles, e.g. “The user is on a tight budget.”

These act like switches: once a preference is declared, it dominates all other attributes.

The takeaway isn’t that agents are biased shoppers, but that this offers a diagnostic window into agent behavior.

🧵6/9
October 23, 2025 at 6:16 PM
Even without human cognitive limits, agents:

- Heavily over-weight ratings
- Over-weight cheaper items when ratings are matched
- Are swayed by trivial order effects
- Fall for simple nudges (e.g. “Best seller”)

These are systematic, often large effects.

🧵5/9
October 23, 2025 at 6:16 PM
The main finding: LLM agents are not the rational, utility-maximizing actors we might hope for.

Rather, they are strongly biased by these cues. We found agents are often 3-10x+ more susceptible to nudges and superficial attribute differences than our human baseline.

🧵4/9
October 23, 2025 at 6:16 PM
We applied ABxLAB to a realistic shopping task, running 80,000+ experiments on 17 SOTA models (GPT-5, Claude 4, Gemini 2.5, Llama 4, etc.).

We systematically manipulated:
💰Prices
⭐️Ratings
🔀Presentation order
👉Classic psychological nudges (authority, social proof, etc)

🧵3/9
October 23, 2025 at 6:16 PM
How does it work? ABxLAB is a "man-in-the-middle" framework.

It intercepts web content in real-time to run controlled experiments on agents by modifying the choice architecture.

Think of it as a behavioral science lab for LLMs.

Paper: arxiv.org/abs/2509.25609

🧵2/9
A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
Environments built for people are increasingly operated by a new class of economic actors: LLM-powered software agents making decisions on our behalf. These decisions range from our purchases to trave...
arxiv.org
October 23, 2025 at 6:16 PM
3. 👤 User preferences act almost like hard rules, where LLMs might incur significant trade-offs to comply with them

4. 🧑 Humans, in contrast, are far less sensitive to such signals
October 2, 2025 at 9:00 PM
In a shopping case study across 17 SOTA LLMs, we find:

1. 🛒 Choices are highly determined by rating, price, incentives, and nudges

2. 🔀 Models follow a lexicographic-like decision rule, hierarchically valuing different attributes
October 2, 2025 at 9:00 PM
The code for Audio Doppelgängers is also open-source. We hope you find it useful for further exploring how and why we can learn from synthetic data.

💻 github.com/PapayaResear...

🧵3/3
GitHub - PapayaResearch/doppelgangers: Contrastive Learning from Synthetic Audio Doppelgängers @ ICLR'25
Contrastive Learning from Synthetic Audio Doppelgängers @ ICLR'25 - PapayaResearch/doppelgangers
github.com
March 12, 2025 at 8:25 PM
In CTAG (ICML24), we show how a simple synth (from SynthAX ⚡️) can recover properties of real-world sounds. Audio Doppelgängers use the same power to learn to listen from what can be perceived as just noise.

CTAG: ctag.media.mit.edu
SynthAX: github.com/PapayaResear...

🧵2/3
March 12, 2025 at 8:25 PM
If you're at NeurIPS, and interested in this topic, come chat! We're working to extend this line of work and value feedback from the community

🧵 3/3
November 26, 2024 at 11:07 PM
In a complex decision-making task, we show how LM-based agents' choices superficially resembled humans', but exhibit suboptimal information acquisition strategies and extreme susceptibility to a simple nudge.

🧵 2/3
November 26, 2024 at 11:07 PM