Kunal Jha
kjha02.bsky.social
Kunal Jha
@kjha02.bsky.social
CS PhD Student @University of Washington, CSxPhilosophy @Dartmouth College
Interested in MARL, Social Reasoning, and Collective Decision making in people, machines, and other organisms
kjha02.github.io
ROTE doesn’t sacrifice accuracy for speed!

While initial program generation takes time, the inferred code can be executed rapidly, making it orders of magnitude more efficient than other LLM-based methods for long-horizon predictions.
October 3, 2025 at 2:26 AM
What explains this performance gap? ROTE handles complexity better. It excels with intricate tasks like cleaning and interacting with objects (e.g., turning items on/off) in Partnr, while baselines only showed success with simpler navigation and object manipulation.
October 3, 2025 at 2:26 AM
We scaled up to the embodied robotics simulator Partnr, a complex, partially observable environment with goal-directed LLM-agents.

ROTE still significantly outperformed all LLM-based and behavior cloning baselines for high-level action prediction in this domain!
October 3, 2025 at 2:25 AM
A key strength of code: zero-shot generalization.

Programs inferred from one environment transfer to new settings more effectively than all other baselines. ROTE's learned programs transfer without needing to re-incur the cost of text generation.
October 3, 2025 at 2:25 AM
Can scripts model nuanced, real human behavior?

We collected human gameplay data and found ROTE not only outperformed all baselines but also achieved human-level performance when predicting the trajectories of real people!
October 3, 2025 at 2:25 AM
Introducing ROTE (Representing Others’ Trajectories as Executables)!

We use LLMs to generate Python programs 💻 that model observed behavior, then uses Bayesian inference to select the most likely ones. The result: A dynamic, composable, and analyzable predictive representation!
October 3, 2025 at 2:24 AM
Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")?

Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior!

shorturl.at/siUYI%F0%9F%...
October 3, 2025 at 2:24 AM
Why do humans prefer CEC agents? They collide less and adapt better to human behavior.
This increased adaptability reflects general norms for cooperation learned across many environments, not just memorized strategies.
April 19, 2025 at 12:09 AM
Human studies confirm our findings! CEC agents achieve higher success rates with human partners than population based methods like FCP and are rated qualitatively better to collaborate with than the SOTA approach (E3T) despite never having seen the level during training.
April 19, 2025 at 12:08 AM
Using empirical game theory analysis, we show CEC agents emerge as the dominant strategy in a population of different agent types during Ad-hoc Teamplay!

When diverse agents must collaborate, the CEC-trained agents are selected for their adaptability and cooperative skills.
April 19, 2025 at 12:08 AM
The result? CEC agents significantly outperform baselines when collaborating zero-shot with novel partners on novel environments.

Even more impressive: CEC agents outperform methods that were specifically trained on the test environment but struggle to adapt to new partners!
April 19, 2025 at 12:08 AM
We built a Jax-based procedural generator creating billions of solvable Overcooked challenges.

Unlike prior work studying only 5 layouts, we can now study cooperative skill transfer at unprecedented scale (1.16e17 possible environments)!

Code available at: shorturl.at/KxAjW
April 19, 2025 at 12:07 AM
Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity.

Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks.

shorturl.at/fqsNN%F0%9F%...
April 19, 2025 at 12:06 AM