David Snyder
dasny25.bsky.social
David Snyder
@dasny25.bsky.social
PhD Student in the IRoM Lab at Princeton University, working on safety and generalization assurances for robots.
(10/13) STEP constructs decision rules by solving an offline convex optimization problem, which yields near-optimal multidimensional decision boundaries for Nmax up to ~500-1000. During evaluation, STEP can be used almost like a look-up table!
May 9, 2025 at 8:01 PM
(9/13) Why Nmax?

Policy evaluation is expensive, due to limited hardware availability and limited resources for human supervision. STEP near-optimally accounts for this practical constraint, and gives the evaluator significant leeway to set a conservative Nmax.
May 9, 2025 at 7:58 PM
(6/13) Yes!

We propose STEP, a sequential test which aggregates evaluation rollouts one-by-one and stops automatically when a desired significance level is reached. It stops quickly when the performance gap is large, and waits if the gap is small.
May 9, 2025 at 7:55 PM
(1/13) How should we rigorously compare robot policies? Comparison is central to robotics research, but is inherently expensive. We introduce STEP, a flexible and data-efficient method for statistically rigorous policy comparison.
Accepted at RSS 2025: tri-ml.github.io/step/
May 9, 2025 at 7:49 PM