https://nsingh1.host.dartmouth.edu
Current agent evals mostly measure competence, but miss behavior e.g. are their decisions stable, rational, manipulable, human-like?
We introduce ABxLAB, a framework for studying agent behavior. Using it we create an agentic consumer behavior benchmark.
🧵1/9
Current agent evals mostly measure competence, but miss behavior e.g. are their decisions stable, rational, manipulable, human-like?
We introduce ABxLAB, a framework for studying agent behavior. Using it we create an agentic consumer behavior benchmark.
🧵1/9
Current agent evals mostly measure competence, but miss behavior e.g. are their decisions stable, rational, manipulable, human-like?
We introduce ABxLAB, a framework for studying agent behavior. Using it we create an agentic consumer behavior benchmark.
🧵1/9
@nikhilsinghmus.bsky.social
Our method learns useful audio representations with randomly synthesized sounds (often better than real data!)
🌐Project: doppelgangers.media.mit.edu
📄Paper: arxiv.org/abs/2406.05923
🧵1/3
@nikhilsinghmus.bsky.social
Our method learns useful audio representations with randomly synthesized sounds (often better than real data!)
🌐Project: doppelgangers.media.mit.edu
📄Paper: arxiv.org/abs/2406.05923
🧵1/3
w/ Nikhil Singh* (@nikhilsinghmus.bsky.social) and Pattie Maes
🔗 openreview.net/forum?id=chb...
🧵 1/3
w/ Nikhil Singh* (@nikhilsinghmus.bsky.social) and Pattie Maes
🔗 openreview.net/forum?id=chb...
🧵 1/3
If you're excited about computational social science, LLMs, digital experiments, real-world problem solving, this could be a great fit
Please reshare!
Deets 👇
If you're excited about computational social science, LLMs, digital experiments, real-world problem solving, this could be a great fit
Please reshare!
Deets 👇