Lightnews — Scholar-powered news

For more analyses and insights, check out the paper and code: shorturl.at/siUYI

Can’t thank my collaborators @aydan_huang265, @EricYe29011995, @natashajaques.bsky.social , @maxkw.bsky.social enough for all the help and support!!!

Modeling Others’ Minds as Code

How can AI quickly and accurately predict the behaviors of others? We show an AI which uses Large Language Models to synthesize agent behavior into Python programs, then Bayesian Inference to reason a...

shorturl.at

October 3, 2025 at 2:27 AM

Kunal Jha

@kjha02.bsky.social

The big takeaway: framing behavior prediction as a program synthesis problem is an accurate, scalable, and efficient path to human-compatible AI!

It allows multi-agent systems to rapidly and accurately anticipate others' actions for more effective collaboration.

October 3, 2025 at 2:26 AM

Kunal Jha

@kjha02.bsky.social

ROTE doesn’t sacrifice accuracy for speed!

While initial program generation takes time, the inferred code can be executed rapidly, making it orders of magnitude more efficient than other LLM-based methods for long-horizon predictions.

October 3, 2025 at 2:26 AM

Kunal Jha

@kjha02.bsky.social

What explains this performance gap? ROTE handles complexity better. It excels with intricate tasks like cleaning and interacting with objects (e.g., turning items on/off) in Partnr, while baselines only showed success with simpler navigation and object manipulation.

October 3, 2025 at 2:26 AM

Kunal Jha

@kjha02.bsky.social

We scaled up to the embodied robotics simulator Partnr, a complex, partially observable environment with goal-directed LLM-agents.

ROTE still significantly outperformed all LLM-based and behavior cloning baselines for high-level action prediction in this domain!

October 3, 2025 at 2:25 AM

Kunal Jha

@kjha02.bsky.social

A key strength of code: zero-shot generalization.

Programs inferred from one environment transfer to new settings more effectively than all other baselines. ROTE's learned programs transfer without needing to re-incur the cost of text generation.

October 3, 2025 at 2:25 AM

Kunal Jha

@kjha02.bsky.social

Can scripts model nuanced, real human behavior?

We collected human gameplay data and found ROTE not only outperformed all baselines but also achieved human-level performance when predicting the trajectories of real people!

October 3, 2025 at 2:25 AM

Kunal Jha

@kjha02.bsky.social

Introducing ROTE (Representing Others’ Trajectories as Executables)!

We use LLMs to generate Python programs 💻 that model observed behavior, then uses Bayesian inference to select the most likely ones. The result: A dynamic, composable, and analyzable predictive representation!

October 3, 2025 at 2:24 AM

Kunal Jha

@kjha02.bsky.social

Traditional AI is stuck! Predicting behavior is either brittle (Behavior Cloning) or too slow with endless belief space enumeration (Inverse Planning).

How can we avoid mental state dualism while building scalable, robust predictive models?

October 3, 2025 at 2:24 AM

Kunal Jha

@kjha02.bsky.social

For the paper and code, see kjha02.github.io/publication/...

Thank you so much to my collaborators @cogscikid.bsky.social @liangyanchenggg @
@simon-du.bsky.social @maxkw.bsky.social @natashajaques.bsky.social for making the first publication of my PhD a fun one!!!

Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination

How can AI develop the ability to cooperate with novel people on novel problems? We show AI learning to cooperate in “self-play” with one partner on many environments helps agents meta-learn to cooper...

kjha02.github.io

April 19, 2025 at 12:11 AM

Kunal Jha

@kjha02.bsky.social

The big takeaway: Environment diversity > Partner diversity

Training across diverse tasks teaches agents how to cooperate, not just whom to cooperate with. This enables zero-shot coordination with novel partners in novel environments, a critical step toward human-compatible AI.

April 19, 2025 at 12:09 AM

Kunal Jha

@kjha02.bsky.social

Our work used NiceWebRL, a Python-based package we helped develop for evaluating Human, Human-AI, and Human-Human gameplay on Jax-based RL environments!

This tool makes crowdsourcing data for CS and CogSci studies easier than ever!

Learn more: github.com/wcarvalho/ni...

GitHub - wcarvalho/nicewebrl: Python library for easily making web Apps to compare humans and AI

Python library for easily making web Apps to compare humans and AI - wcarvalho/nicewebrl

github.com

April 19, 2025 at 12:09 AM

Kunal Jha

@kjha02.bsky.social

Why do humans prefer CEC agents? They collide less and adapt better to human behavior.
This increased adaptability reflects general norms for cooperation learned across many environments, not just memorized strategies.

April 19, 2025 at 12:09 AM

Kunal Jha

@kjha02.bsky.social

Human studies confirm our findings! CEC agents achieve higher success rates with human partners than population based methods like FCP and are rated qualitatively better to collaborate with than the SOTA approach (E3T) despite never having seen the level during training.

April 19, 2025 at 12:08 AM

Kunal Jha

@kjha02.bsky.social

Using empirical game theory analysis, we show CEC agents emerge as the dominant strategy in a population of different agent types during Ad-hoc Teamplay!

When diverse agents must collaborate, the CEC-trained agents are selected for their adaptability and cooperative skills.

April 19, 2025 at 12:08 AM

Kunal Jha

@kjha02.bsky.social

The result? CEC agents significantly outperform baselines when collaborating zero-shot with novel partners on novel environments.

Even more impressive: CEC agents outperform methods that were specifically trained on the test environment but struggle to adapt to new partners!

April 19, 2025 at 12:08 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news