Aaron Roth
aaroth.bsky.social
Aaron Roth
@aaroth.bsky.social
Professor at Penn, Amazon Scholar at AWS. Interested in machine learning, uncertainty quantification, game theory, privacy, fairness, and most of the intersections therein
Pinned
Aligning an AI with human preferences might be hard. But there is more than one AI out there, and users can choose which to use. Can we get the benefits of a fully aligned AI without solving the alignment problem? In a new paper we study a setting in which the answer is yes.
Reposted by Aaron Roth
Did you just miss punching your ticket to Rio or Salt Lake City? Wanna go to a conference where people will engage with you and your paper on foundations of responsible computing, and you won't get lost in the crowd?

Submit to #FORC2026, in Boston in June! Deadline in 2 weeks.
February 2, 2026 at 6:16 PM
Reposted by Aaron Roth
The other paper accepted to @iclr-conf.bsky.social 2026 🇧🇷. Our work on replicable RL sheds some light on how to consistently make decisions in RL.

@ericeaton.bsky.social @mkearnsphilly.bsky.social @aaroth.bsky.social @sikatasengupta.bsky.social @optimistsinc.bsky.social
I think I posted about it before but never with a thread. We recently put a new preprint on arxiv.

📖 Replicable Reinforcement Learning with Linear Function Approximation

🔗 arxiv.org/abs/2509.08660

In this paper, we study formal replicability in RL with linear function approximation. The... (1/6)
Replicable Reinforcement Learning with Linear Function Approximation
Replication of experimental results has been a challenge faced by many scientific disciplines, including the field of machine learning. Recent work on the theory of machine learning has formalized rep...
arxiv.org
January 26, 2026 at 4:08 PM
Reposted by Aaron Roth
I try to avoid posting about politics here, but I feel compelled to say some things that should be obvious: 🧵
January 25, 2026 at 5:43 PM
Reposted by Aaron Roth
The NSF has played a key role in American science, and risks being collateral damage in the war against science.
#econsky #academicsky #NSF #science
marketdesigner.blogspot.com/2026/01/hist...
History of the U.S. National Science Foundation (NSF)
marketdesigner.blogspot.com
January 12, 2026 at 1:55 PM
Reposted by Aaron Roth
The paper is here: arxiv.org/abs/2601.05245 Its joint work with @ncollina.bsky.social, Jiuyao Lu, and George Noarov. Natalie and George are on the job market --- check them out. www.seas.upenn.edu/~ncollina/ noarov.com
January 9, 2026 at 1:21 PM
Not sure of the details, but I believe its related to the experiment that STOC ran giving feedback with a version of Gemini Deep Think which got generally postiive reviews for critiquing math research.google/blog/gemini-...
Gemini provides automated feedback for theoretical computer scientists at STOC 2026
research.google
January 9, 2026 at 3:01 PM
Whats wrong with providing access to a fancy LLM to give feedback to authors about their own papers?
January 9, 2026 at 2:35 PM
But we ended up showing that this is impossible in generality. The results in the paper also lay out a slightly more nuanced landscape, and there remain some interesting open questions about the power of reductions from multicalibration to marginal calibration. Take a look!
January 9, 2026 at 1:21 PM
This was a fun project in part because I didn't know what the right answer was. I started out believing that there should be a rate preserving reduction from multicalibration to marginal calibration, lifting the (unknown) minimax calibration rates to multicalibration.
January 9, 2026 at 1:21 PM
Informally its because you can define instances and groups/subsequences that punish the learner for any deviation from honest forecasting. The honest strategy works broadly; anything that deviates from it necessarily "overfits" to the weak marginal calibration metric.
January 9, 2026 at 1:21 PM
It could have been that the minimax rates for the two problems were identical, up to a ~ logarithmic term in the number of subsequences, which is what the upper bounds pay. What we show is that they are fundamentally different --- you can't beat the "honest" T^{2/3} rate.
January 9, 2026 at 1:21 PM
What about for multicalibration? The same kinds of techniques that get T^{2/3} rates for calibration also work for multicalibration --- Blackwell Approachability, multiobjective optimization, etc. Morally this is because the "honest" strategy also gets multicalibration.
January 9, 2026 at 1:21 PM
It is much less clear that there are strategies that let you do this profitably against a worst case adversary --- but thats exactly what Dagan et al. showed recently to establish O(T^{2/3}-eps) rates for marginal calibration arxiv.org/abs/2406.13668 --- that was super surprising.
Breaking the $T^{2/3}$ Barrier for Sequential Calibration
A set of probabilistic forecasts is calibrated if each prediction of the forecaster closely approximates the empirical distribution of outcomes on the subset of timesteps where that prediction was mad...
arxiv.org
January 9, 2026 at 1:21 PM
Thinking about truthful forecasting is what gets you T^{2/3} rates. But maybe you could do better - by cleverly strategizing to arrange for cancellations of the random noise with intentional bias that you inject. Its easy to see that you can do this on particular sequences.
January 9, 2026 at 1:21 PM
Suppose you knew the chance of rain. One strategy is just forecast it truthfully. The bias of your predictions would be 0, but there would be noise: sometimes when you predict a 70% chance of rain it doesn't rain. The noise is higher the less frequently you make a prediction.
January 9, 2026 at 1:21 PM
Calibration asks that forecasts behave like probabilities marginally over a sequence. Amongst all the days I predict a 70% chance of rain, it should rain 70% of the time, etc. Multicalibration asks for the same guarantee simultaneously on many pre-defined subsequences.
January 9, 2026 at 1:21 PM
The paper is here: arxiv.org/abs/2601.05245 Its joint work with @ncollina.bsky.social, Jiuyao Lu, and George Noarov. Natalie and George are on the job market --- check them out. www.seas.upenn.edu/~ncollina/ noarov.com
January 9, 2026 at 1:21 PM
Excited about a new paper! Multicalibration turns out to be strictly harder than marginal calibration. We prove tight Omega(T^{2/3}) lower bounds for online multicalibration, separating it from online marginal calibration for which better rates were recently discovered.
January 9, 2026 at 1:21 PM
AI assisted papers are very good at form. They are written in the voice of an experienced researcher, and so evade our old heuristics. We need to learn a new set of red flags. These include citation errors, vague gesturing to standard results, and other things that we will learn from experience.
December 30, 2025 at 3:01 PM
Yes. We already have a set of ingrained red flags for human written papers that signal a lack of care: not citing the relevant literature, not formatting or typesetting math correctly, etc. These don't mean the paper is wrong but they strongly correlate with lack of care. But...
I'd like to propose the following norm for peer review of papers. If a paper shows clear signs of LLM-generated errors that were not detected by the author, the paper should be immediately rejected. My reasoning: 1/ #ResearchIntegrity
December 30, 2025 at 3:01 PM
Reposted by Aaron Roth
Getting absurd over at the ACM…
Some may remember this ACM guidance on inclusive terminology. E.g., as advocated by an anon ICLR reviewer, it recommends against the technical term Byzantine.

It was recently updated, and suggests avoiding "binary classification" and "stable marriage" (incorrectly defined)
December 29, 2025 at 12:05 AM
STOC ran an experiment in which authors were able to use a Gemini model to check papers for mathematical errors before submission. It got positive feedback: research.google/blog/gemini-... - it is quite good at catching mathematical errors. Obv not a replacement for peer review but a useful tool.
Gemini provides automated feedback for theoretical computer scientists at STOC 2026
research.google
December 29, 2025 at 12:20 AM
So, many things will change --- I'm convinced that AI will be transformative for mathematical research. I think the changes will go beyond the day-to-day, and will extend to how we train our students and how we disseminate our work. The future is exciting and uncertain.
December 21, 2025 at 7:01 PM
And we are already seeing that reducing the time and effort needed to produce "a paper" (not a -good- paper) is going to destabilize our existing institutions for peer review. We need to figure out how to manage researcher attention at scale and not be drowned in research slop.
December 21, 2025 at 7:01 PM
A world in which clever discoveries happen in data centers, and the role of the professional researcher is careful verification and due diligence is a world in which the job of researcher is much less fun. Many fewer people with choices would want this job, given the other costs.
December 21, 2025 at 7:01 PM