Lightnews — Scholar-powered news

Manuel Cherep

@mcherep.bsky.social

Work w/ Chengtian Ma, Abigail Xu, Maya Shaked, Pattie Maes, and @nikhilsinghmus.bsky.social

🧵9/9

October 23, 2025 at 6:16 PM

Manuel Cherep

@mcherep.bsky.social

ABxLAB offers:

✅ An open-source man-in-the-middle testbed for real web environments
✅ A scalable consumer choice benchmark for agentic decision-making
✅ A dataset of causal effects of ratings, prices, and nudges across 17 LLMs

📦 Code: github.com/PapayaResearch/abxlab

🧵8/9

GitHub - PapayaResearch/abxlab: A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments

A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments - PapayaResearch/abxlab

github.com

October 23, 2025 at 6:16 PM

Manuel Cherep

@mcherep.bsky.social

This changes the analysis for LLM agents: not “Did it complete the task?” but:

“What governs its decisions when multiple valid options exist?”

A question behavioral scientists have been asking about humans for decades. ABxLAB is a step toward that science for agents.

🧵7/9

October 23, 2025 at 6:16 PM

Manuel Cherep

@mcherep.bsky.social

We tested user profiles, e.g. “The user is on a tight budget.”

These act like switches: once a preference is declared, it dominates all other attributes.

The takeaway isn’t that agents are biased shoppers, but that this offers a diagnostic window into agent behavior.

🧵6/9

October 23, 2025 at 6:16 PM

Manuel Cherep

@mcherep.bsky.social

Even without human cognitive limits, agents:

- Heavily over-weight ratings
- Over-weight cheaper items when ratings are matched
- Are swayed by trivial order effects
- Fall for simple nudges (e.g. “Best seller”)

These are systematic, often large effects.

🧵5/9

October 23, 2025 at 6:16 PM

Manuel Cherep

@mcherep.bsky.social

The main finding: LLM agents are not the rational, utility-maximizing actors we might hope for.

Rather, they are strongly biased by these cues. We found agents are often 3-10x+ more susceptible to nudges and superficial attribute differences than our human baseline.

🧵4/9

October 23, 2025 at 6:16 PM

Manuel Cherep

@mcherep.bsky.social

We applied ABxLAB to a realistic shopping task, running 80,000+ experiments on 17 SOTA models (GPT-5, Claude 4, Gemini 2.5, Llama 4, etc.).

We systematically manipulated:
💰Prices
⭐️Ratings
🔀Presentation order
👉Classic psychological nudges (authority, social proof, etc)

🧵3/9

October 23, 2025 at 6:16 PM

Manuel Cherep

@mcherep.bsky.social

How does it work? ABxLAB is a "man-in-the-middle" framework.

It intercepts web content in real-time to run controlled experiments on agents by modifying the choice architecture.

Think of it as a behavioral science lab for LLMs.

Paper: arxiv.org/abs/2509.25609

🧵2/9

A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments

Environments built for people are increasingly operated by a new class of economic actors: LLM-powered software agents making decisions on our behalf. These decisions range from our purchases to trave...

arxiv.org

October 23, 2025 at 6:16 PM

Manuel Cherep

@mcherep.bsky.social

3. 👤 User preferences act almost like hard rules, where LLMs might incur significant trade-offs to comply with them

4. 🧑 Humans, in contrast, are far less sensitive to such signals

October 2, 2025 at 9:00 PM

Manuel Cherep

@mcherep.bsky.social

In a shopping case study across 17 SOTA LLMs, we find:

1. 🛒 Choices are highly determined by rating, price, incentives, and nudges

2. 🔀 Models follow a lexicographic-like decision rule, hierarchically valuing different attributes

October 2, 2025 at 9:00 PM

Manuel Cherep

@mcherep.bsky.social

The code for Audio Doppelgängers is also open-source. We hope you find it useful for further exploring how and why we can learn from synthetic data.

💻 github.com/PapayaResear...

🧵3/3

GitHub - PapayaResearch/doppelgangers: Contrastive Learning from Synthetic Audio Doppelgängers @ ICLR'25

Contrastive Learning from Synthetic Audio Doppelgängers @ ICLR'25 - PapayaResearch/doppelgangers

github.com

March 12, 2025 at 8:25 PM

Manuel Cherep

@mcherep.bsky.social

In CTAG (ICML24), we show how a simple synth (from SynthAX ⚡️) can recover properties of real-world sounds. Audio Doppelgängers use the same power to learn to listen from what can be perceived as just noise.

CTAG: ctag.media.mit.edu
SynthAX: github.com/PapayaResear...

🧵2/3

March 12, 2025 at 8:25 PM

Manuel Cherep

@mcherep.bsky.social

If you're at NeurIPS, and interested in this topic, come chat! We're working to extend this line of work and value feedback from the community

🧵 3/3

November 26, 2024 at 11:07 PM

Manuel Cherep

@mcherep.bsky.social

In a complex decision-making task, we show how LM-based agents' choices superficially resembled humans', but exhibit suboptimal information acquisition strategies and extreme susceptibility to a simple nudge.

🧵 2/3

November 26, 2024 at 11:07 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news