Lightnews — Scholar-powered news

Raj Ghugare

@raj-ghugare.bsky.social

Collaborators' Blue Skies!

Catherine Ji - crji.bsky.social
Jin Schofield - Jinschofield.bsky.social

cathy ji (@crji.bsky.social)

princeton physics phd mit '23 physics + math RL, interpretable AI4Science, stat phys

crji.bsky.social

October 16, 2025 at 11:22 PM

Raj Ghugare

@raj-ghugare.bsky.social

Check out our website and paper for detailed results and discussions!

Website: rajghugare19.github.io/builderbench...
Paper: arxiv.org/abs/2510.06288
Code: github.com/rajghugare19...

With amazing collaborators Catherine Ji, Kathryn Wantlin, Jin Schofield, @ben-eysenbach.bsky.social.

October 16, 2025 at 11:08 PM

Raj Ghugare

@raj-ghugare.bsky.social

October 16, 2025 at 11:08 PM

Raj Ghugare

@raj-ghugare.bsky.social

There are many open questions and exciting opportunities for future work, we have noted down a subset of them. I am excited to see the progress of research on self-supervised exploration and open-ended learning that is enabled by builderbench!

October 16, 2025 at 11:08 PM

Raj Ghugare

@raj-ghugare.bsky.social

We tried using LLMs with inference time thinking to solve some of our tasks. The models were prompted with the problem setting, given an example solution, and were asked to generate a high level plan in language. Both ChatGPT and Gemini were not able to provide the correct plan for any of the tasks!

October 16, 2025 at 11:08 PM

Raj Ghugare

@raj-ghugare.bsky.social

Training on the test goals improves both the returns and success achieved by the best agents. However as the number of cubes and the complexity of the tasks increase, current algorithms are not able to achieve a non zero success.

October 16, 2025 at 11:08 PM

Raj Ghugare

@raj-ghugare.bsky.social

We open-source single-file implementations of 6 representative reinforcement learning (RL) algorithms and self-supervised exploration algorithms. Most of the complex tasks remain unsolvable by the purely self-supervised algorithms we tried.

October 16, 2025 at 11:08 PM

Raj Ghugare

@raj-ghugare.bsky.social

To evaluate open-ended exploration and generalization, we design the self-supervised protocol where agents explore the world and learn to solve autotelic tasks. To provide additional feedback for research, we also provide a “training-wheels” protocol where agents are trained on the test goal.

October 16, 2025 at 11:08 PM

Raj Ghugare

@raj-ghugare.bsky.social

The BuilderBench simulator is developed using MuJoCo and JAX. It is hardware accelerated and allows RL training between 10 to 100 times faster than purely CPU based open-ended benchmarks (e.g., training a PPO agent to stack two blocks takes 30 minutes on a single GPU).

October 16, 2025 at 11:08 PM

Raj Ghugare

@raj-ghugare.bsky.social

BuilderBench is meant to develop agents that can build any structure with building-blocks. Tasks not only require motor skills but higher-level skills such as logical & geometrical reasoning, intuitive physics and reasoning about counterweights, buttresses, and temporary scaffolding!

October 16, 2025 at 11:08 PM

Raj Ghugare

@raj-ghugare.bsky.social

Scalable learning mechanisms for agents that solve novel tasks via experience remain an open problem. We argue that a key reason is suitable benchmarks.

Happy to share BuilderBench, a benchmark to accelerate research in pre-training that centers learning from experience.

October 16, 2025 at 11:08 PM

Raj Ghugare

@raj-ghugare.bsky.social

Work done with my amazing advisor and collaborator @ben-eysenbach.bsky.social

Paper -> arxiv.org/abs/2505.23527
Code -> github.com/Princeton-RL...
Project Page -> rajghugare19.github.io/nf4rl/

June 5, 2025 at 5:06 PM

Raj Ghugare

@raj-ghugare.bsky.social

I’m excited about the possibilities this opens up for RL and robotics. Large-scale behavioral models can be trained directly via likelihoods—without expensive sampling (diffusion) or discretization errors (transformers). They can also be fine-tuned directly using exact MaxEnt RL!

June 5, 2025 at 5:06 PM

Raj Ghugare

@raj-ghugare.bsky.social

In unsupervised goal conditioned RL: A simple goal sampling strategy that uses NF’s ability to provide density estimates outperforms supervised oracles like contrastive RL on 3 standard exploration tasks.

June 5, 2025 at 5:06 PM

Raj Ghugare

@raj-ghugare.bsky.social

On offline RL: NF-RLBC outperforms strong baselines like flow matching and diffusion-based Q-learning on half the tasks. This boost comes by simply replacing the Gaussian policy with an NF in the SAC+BC recipe. Unlike diffusion, no distillation or importance sampling is needed.

June 5, 2025 at 5:06 PM

Raj Ghugare

@raj-ghugare.bsky.social

On conditional Imitation learning: (left) Just exchanging a gaussian policy with an NF policy leads to significant improvements. (right) NF-GCBC outerforms the flow matching policies, as well as dedicated offline RL algorithms like IQL or quasimetric RL.

June 5, 2025 at 5:06 PM

Raj Ghugare

@raj-ghugare.bsky.social

On Imitation learning: NF-BC is competitive with diffusion / transformer policies, which are the go to models for imitation learning today. But NF-BC requires fewer hyper-parameters (no SDEs / no noise scheduling / no discrete representations).

June 5, 2025 at 5:06 PM

Raj Ghugare

@raj-ghugare.bsky.social

Combining this architecture with the canonical RL algorithms leads to a strong and simple recipe. On 82 tasks spanning 5 settings like imitation learning, offline, goal-conditioned, unsupervised RL this recipe rivals and surpasses strong baselines using diffusion, flow matching or transformers.

June 5, 2025 at 5:06 PM

Raj Ghugare

@raj-ghugare.bsky.social

Stacking affine coupling networks, permutations layers and layerNorm results in a simple and scalable architecture. It seamlessly integrates with canonical imitation learning, offline RL, goal conditioned and unsupervised RL algorithms.

June 5, 2025 at 5:06 PM

Raj Ghugare

@raj-ghugare.bsky.social

We believe their underuse is due to the (mis)conception that they have overly restricted architectures or training instabilities. We revisit one of the simplest NF architectures and show that it can yield strong results across diverse RL problem settings.

June 5, 2025 at 5:06 PM

Raj Ghugare

@raj-ghugare.bsky.social

The core of most RL algorithms is just likelihood estimation, sampling, and variational Inference (see attached image). NFs can efficiently do all three! It raises the question of why don’t we see them more commonly used in RL?

June 5, 2025 at 5:06 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news