Lightnews — Scholar-powered news

Daniel Brown

@daniel-brown.bsky.social

CS assistant prof @Utah. Researches human-robot interaction, human-in-the-loop ML, AI safety and alignment. https://users.cs.utah.edu/~dsbrown/

Posts Replies Media Videos

Daniel Brown

@daniel-brown.bsky.social

We hope this work can help inspire the development of better AI alignment tests and evaluations for LLM reward models.

Check out the workshop paper here: anamarasovic.com/publications...

8/8

anamarasovic.com

October 10, 2025 at 4:03 PM

Daniel Brown

@daniel-brown.bsky.social

We applied this approach to RewardBench and found evidence that much of the data in safety and reasoning datasets may be redundant (44% for safety and 24% for reasoning) and that this can lead to inflated alignment scores.

7/8

October 10, 2025 at 4:03 PM

Daniel Brown

@daniel-brown.bsky.social

By scaling up these ideas to LLMs, we can now estimate the set of reward model weights (weights that map the last decoder hidden state to a scalar output) that are consistent with a preference alignment dataset and also identify redundant and non-redundant examples in the preference dataset.

6/8

October 10, 2025 at 4:03 PM

Daniel Brown

@daniel-brown.bsky.social

Once you find these core demonstrations or comparisons you can use them to craft efficient alignment tests. But until recently, we were only able to empirically test these ideas on simple toy domains.

5/8

October 10, 2025 at 4:03 PM

Daniel Brown

@daniel-brown.bsky.social

The main idea was that for linear rewards, we can determine, via an intersection of half-spaces, the set of reward functions that make a policy optimal and that this set of rewards is defined by a small number of "non-redundant" demonstrations or comparisons.

4/8

October 10, 2025 at 4:03 PM

Daniel Brown

@daniel-brown.bsky.social

It was a fun paper and has some interesting nuggets, like the fact that there exist sufficient conditions under which we can verify exact and approximate AI alignment across an infinite set of deployment environments via a constant-query-complexity test.

3/8

October 10, 2025 at 4:03 PM

Daniel Brown

@daniel-brown.bsky.social

As some background, a couple of years ago I worked with
Jordan Schneider, @scottniekum.bsky.social, and Anca Dragan on what we called "Value Alignment Verification" with the goal of efficiently testing whether an AI system is aligned with human values.
arxiv.org/abs/2012.01557

2/8

Value Alignment Verification

As humans interact with autonomous agents to perform increasingly complicated, potentially risky tasks, it is important to be able to efficiently evaluate an agent's performance and correctness. In th...

arxiv.org

October 10, 2025 at 4:03 PM

Daniel Brown

@daniel-brown.bsky.social

This was a really fun collaboration with Jordan Thompson, Britton Jordan, and Alan Kuntz.

Check out our paper here: openreview.net/forum?id=K7K...

5/5

Agreement Volatility: A Second-Order Metric for Uncertainty...

Autonomous surgical robots are a promising solution to the increasing demand for surgery amid a shortage of surgeons. Recent work has proposed learning-based approaches for the autonomous...

openreview.net

September 29, 2025 at 6:27 PM

Daniel Brown

@daniel-brown.bsky.social

Our approach also enables uncertainty attribution! We can backpropagate uncertainty estimates into an input point cloud to visualize and interpret the robot's uncertainty.

If you're at #CoRL25, check out Jordan Thompson's talk and poster (Spotlight 6 & Poster 3).

4/5

September 29, 2025 at 6:27 PM

Daniel Brown

@daniel-brown.bsky.social

We apply our approach to surgically-inspired deformable tissue manipulation and find it achieves a 10% lower reliance on human interventions compared to prior work that leverages variance-based uncertainty estimates.

3/5

September 29, 2025 at 6:27 PM

Daniel Brown

@daniel-brown.bsky.social

Inspired by prior work on active, uncertainty-aware human-robot hand-offs like Ryan Hoque and @ken-goldberg.bsky.social's ThriftyDAgger (arxiv.org/abs/2109.08273), we show that agreement volatility enables robots to know when they need help so they can request appropriate human interventions.

2/5

September 29, 2025 at 6:27 PM

Daniel Brown

@daniel-brown.bsky.social

If you're in Melbourne, come check out Connor's talk in the Teleoperation and Shared Control session today!

Paper: arxiv.org/abs/2501.08389
Website: sites.google.com/view/zerosho...

This is joint work with two of my other amazing PhD students Zohre Karimi and Atharv Belsare!

3/3

March 3, 2025 at 8:34 PM

Daniel Brown

@daniel-brown.bsky.social

We study how to enable robots to use end-effector vision to estimate zero-shot human intents in conjunction with blended control to help humans accomplish manipulation tasks like grocery shelving with unknown and dynamically changing object locations.

2/3

March 3, 2025 at 8:34 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news