Author | Lightnews

Reposted by Aaron Roth

@forcconf.bsky.social

Did you just miss punching your ticket to Rio or Salt Lake City? Wanna go to a conference where people will engage with you and your paper on foundations of responsible computing, and you won't get lost in the crowd?

Submit to #FORC2026, in Boston in June! Deadline in 2 weeks.

February 2, 2026 at 6:16 PM

Reposted by Aaron Roth

Marcel Hussing

@marcelhussing.bsky.social

The other paper accepted to @iclr-conf.bsky.social 2026 🇧🇷. Our work on replicable RL sheds some light on how to consistently make decisions in RL.

@ericeaton.bsky.social @mkearnsphilly.bsky.social @aaroth.bsky.social @sikatasengupta.bsky.social @optimistsinc.bsky.social

Marcel Hussing @marcelhussing.bsky.social · Oct 26

I think I posted about it before but never with a thread. We recently put a new preprint on arxiv.

📖 Replicable Reinforcement Learning with Linear Function Approximation

🔗 arxiv.org/abs/2509.08660

In this paper, we study formal replicability in RL with linear function approximation. The... (1/6)

Replicable Reinforcement Learning with Linear Function Approximation

Replication of experimental results has been a challenge faced by many scientific disciplines, including the field of machine learning. Recent work on the theory of machine learning has formalized rep...

arxiv.org

January 26, 2026 at 4:08 PM

Reposted by Aaron Roth

Thomas Steinke

@stein.ke

I try to avoid posting about politics here, but I feel compelled to say some things that should be obvious: 🧵

January 25, 2026 at 5:43 PM

Reposted by Aaron Roth

Al Roth

@alroth.bsky.social

The NSF has played a key role in American science, and risks being collateral damage in the war against science.
#econsky #academicsky #NSF #science
marketdesigner.blogspot.com/2026/01/hist...

History of the U.S. National Science Foundation (NSF)

marketdesigner.blogspot.com

January 12, 2026 at 1:55 PM

Reposted by Aaron Roth

Aaron Roth

@aaroth.bsky.social

The paper is here: arxiv.org/abs/2601.05245 Its joint work with @ncollina.bsky.social, Jiuyao Lu, and George Noarov. Natalie and George are on the job market --- check them out. www.seas.upenn.edu/~ncollina/ noarov.com

January 9, 2026 at 1:21 PM

Aaron Roth

@aaroth.bsky.social

Not sure of the details, but I believe its related to the experiment that STOC ran giving feedback with a version of Gemini Deep Think which got generally postiive reviews for critiquing math research.google/blog/gemini-...

Gemini provides automated feedback for theoretical computer scientists at STOC 2026

research.google

January 9, 2026 at 3:01 PM

Aaron Roth

@aaroth.bsky.social

Whats wrong with providing access to a fancy LLM to give feedback to authors about their own papers?

January 9, 2026 at 2:35 PM

Aaron Roth

@aaroth.bsky.social

But we ended up showing that this is impossible in generality. The results in the paper also lay out a slightly more nuanced landscape, and there remain some interesting open questions about the power of reductions from multicalibration to marginal calibration. Take a look!

January 9, 2026 at 1:21 PM

Aaron Roth

@aaroth.bsky.social

This was a fun project in part because I didn't know what the right answer was. I started out believing that there should be a rate preserving reduction from multicalibration to marginal calibration, lifting the (unknown) minimax calibration rates to multicalibration.

January 9, 2026 at 1:21 PM

Aaron Roth

@aaroth.bsky.social

Informally its because you can define instances and groups/subsequences that punish the learner for any deviation from honest forecasting. The honest strategy works broadly; anything that deviates from it necessarily "overfits" to the weak marginal calibration metric.

January 9, 2026 at 1:21 PM

Aaron Roth

@aaroth.bsky.social

It could have been that the minimax rates for the two problems were identical, up to a ~ logarithmic term in the number of subsequences, which is what the upper bounds pay. What we show is that they are fundamentally different --- you can't beat the "honest" T^{2/3} rate.

January 9, 2026 at 1:21 PM

Aaron Roth

@aaroth.bsky.social

What about for multicalibration? The same kinds of techniques that get T^{2/3} rates for calibration also work for multicalibration --- Blackwell Approachability, multiobjective optimization, etc. Morally this is because the "honest" strategy also gets multicalibration.

January 9, 2026 at 1:21 PM

Aaron Roth

@aaroth.bsky.social

It is much less clear that there are strategies that let you do this profitably against a worst case adversary --- but thats exactly what Dagan et al. showed recently to establish O(T^{2/3}-eps) rates for marginal calibration arxiv.org/abs/2406.13668 --- that was super surprising.

Breaking the $T^{2/3}$ Barrier for Sequential Calibration

A set of probabilistic forecasts is calibrated if each prediction of the forecaster closely approximates the empirical distribution of outcomes on the subset of timesteps where that prediction was mad...

arxiv.org

January 9, 2026 at 1:21 PM

Aaron Roth

@aaroth.bsky.social

Thinking about truthful forecasting is what gets you T^{2/3} rates. But maybe you could do better - by cleverly strategizing to arrange for cancellations of the random noise with intentional bias that you inject. Its easy to see that you can do this on particular sequences.

January 9, 2026 at 1:21 PM

Aaron Roth

@aaroth.bsky.social

Suppose you knew the chance of rain. One strategy is just forecast it truthfully. The bias of your predictions would be 0, but there would be noise: sometimes when you predict a 70% chance of rain it doesn't rain. The noise is higher the less frequently you make a prediction.

January 9, 2026 at 1:21 PM

Aaron Roth

@aaroth.bsky.social

Calibration asks that forecasts behave like probabilities marginally over a sequence. Amongst all the days I predict a 70% chance of rain, it should rain 70% of the time, etc. Multicalibration asks for the same guarantee simultaneously on many pre-defined subsequences.

January 9, 2026 at 1:21 PM

Aaron Roth

@aaroth.bsky.social

The paper is here: arxiv.org/abs/2601.05245 Its joint work with @ncollina.bsky.social, Jiuyao Lu, and George Noarov. Natalie and George are on the job market --- check them out. www.seas.upenn.edu/~ncollina/ noarov.com

January 9, 2026 at 1:21 PM

Aaron Roth

@aaroth.bsky.social

Excited about a new paper! Multicalibration turns out to be strictly harder than marginal calibration. We prove tight Omega(T^{2/3}) lower bounds for online multicalibration, separating it from online marginal calibration for which better rates were recently discovered.

January 9, 2026 at 1:21 PM

Aaron Roth

@aaroth.bsky.social

AI assisted papers are very good at form. They are written in the voice of an experienced researcher, and so evade our old heuristics. We need to learn a new set of red flags. These include citation errors, vague gesturing to standard results, and other things that we will learn from experience.

December 30, 2025 at 3:01 PM

Aaron Roth

@aaroth.bsky.social

Yes. We already have a set of ingrained red flags for human written papers that signal a lack of care: not citing the relevant literature, not formatting or typesetting math correctly, etc. These don't mean the paper is wrong but they strongly correlate with lack of care. But...

Thomas Dietterich @tdietterich.bsky.social · Dec 28

I'd like to propose the following norm for peer review of papers. If a paper shows clear signs of LLM-generated errors that were not detected by the author, the paper should be immediately rejected. My reasoning: 1/ #ResearchIntegrity

December 30, 2025 at 3:01 PM

Reposted by Aaron Roth

Jessica Hullman

@jessicahullman.bsky.social

Getting absurd over at the ACM…

Gautam Kamath @gautamkamath.com · Dec 28

Some may remember this ACM guidance on inclusive terminology. E.g., as advocated by an anon ICLR reviewer, it recommends against the technical term Byzantine.

It was recently updated, and suggests avoiding "binary classification" and "stable marriage" (incorrectly defined)

December 29, 2025 at 12:05 AM

Aaron Roth

@aaroth.bsky.social

STOC ran an experiment in which authors were able to use a Gemini model to check papers for mathematical errors before submission. It got positive feedback: research.google/blog/gemini-... - it is quite good at catching mathematical errors. Obv not a replacement for peer review but a useful tool.

Gemini provides automated feedback for theoretical computer scientists at STOC 2026

research.google

December 29, 2025 at 12:20 AM

Aaron Roth

@aaroth.bsky.social

So, many things will change --- I'm convinced that AI will be transformative for mathematical research. I think the changes will go beyond the day-to-day, and will extend to how we train our students and how we disseminate our work. The future is exciting and uncertain.

December 21, 2025 at 7:01 PM

Aaron Roth

@aaroth.bsky.social

And we are already seeing that reducing the time and effort needed to produce "a paper" (not a -good- paper) is going to destabilize our existing institutions for peer review. We need to figure out how to manage researcher attention at scale and not be drowned in research slop.

December 21, 2025 at 7:01 PM

Aaron Roth

@aaroth.bsky.social

A world in which clever discoveries happen in data centers, and the role of the professional researcher is careful verification and due diligence is a world in which the job of researcher is much less fun. Many fewer people with choices would want this job, given the other costs.

December 21, 2025 at 7:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news