Lightnews — Scholar-powered news

Alexander Terenin - on the faculty job market

@avt.im

Check out the work at: arxiv.org/abs/2502.14790

And, again, shoutout to amazing coauthor Jeff Negrea! Working together has been a great pleasure!

Stay tuned for follow-up: we've been working on using this viewpoint to understand other correlated perturbation-based algorithms.

Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces

We develop a form Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future...

arxiv.org

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

So this sums up the work! If you followed along, thanks for the interest!

I think you'd agree that "Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces" is a much better title than before. The old one was much harder to pronounce.

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

The Bayesian viewpoint proves useful for developing this analysis.

It allows us to guess what a good prior will be, and suggests ways to use probability as a tool to prove the algorithm works.

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

We prove that the Bayesian approach works in this setting too.

To achieve this, we develop a new probabilistic analysis of correlated Gaussian follow-the-perturbed-leader algorithms, of which ours is a special case.

This has been an open challenge in the area.

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

The second one is where X = [0,1]^d and Y is the space of bounded Lipschitz functions.

Here, you can't use a prior with independence across actions. You need to share information between actions.

We do this by using a Gaussian process, with correlations between actions.

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

The first one is the classical discrete setting where standard algorithms such as exponential weights are studied.

You can use a Gaussian prior which is independent across actions.

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

Okay, so we now know what "Bayesian Algorithms for Adversarial Online Learning" are.

What about "from Finite to Infinite Action Spaces"?

This covers the two settings we show the aforementioned results in.

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

This approach appears to not make any sense: the Bayesian model is completely fake.

We're pretending to know a distribution for how the adversary will act in the future.

But, in reality, they can do anything.

And yet... we show that this works!

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

We show that this game secretly has a natural Bayesian strategy - one we show is strong.

What's the strategy?

It's really simple:
- Place a prior distribution of what the adversary will do in the future
- Condition on what the adversary has done
- Sample from the posterior

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

One observation about adversarial online learning is that it appears to have nothing to do with Bayesian learning.

There is a two-player zero-sum game, not a joint probability distribution.

So you can't just solve it by applying Bayes' Rule. Or can you?

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

Okay, so now we understand what "Adversarial Online Learning" is.

We propose "Bayesian Algorithms" for this.

What does that mean? Let's unpack.

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

Online learning is therefore a good model for learning to explore by taking random actions.

In contrast to other approaches to resolving explore-exploit tradeoffs such as upper confidence bounds which produce purely deterministic strategies.

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

So that's our setting. Why's it interesting?

Because many other hard decision problems can be reduced to online learning, including certain forms of reinforcement learning (via decision-estimation coefficients), equilibrium computation (via no-regret dynamics), and others.

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

Specifically, their goal is to minimize regret

R(p,q) = E_{x_t~p_t, y_t~q_t} \sup_{x\in X} \sum_{t=1}^T y_t(x) - \sum_{t=1}^T y_t(x_t).

Meaning, the learner compares the sum of their rewards y_t(x_t) with the sum of y_t(x) for the best possible single non-time-dependent x.

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

The learner's goal is to achieve the highest rewards possible. But at each time, the adversary can choose a different reward function.

So why is this game not impossible?

Because the learner only compares how well they do with the *sum* of the adversary's previous rewards.

September 29, 2025 at 7:55 PM

Alexander Terenin - on the faculty job market

@avt.im

"Adversarial Online Learning" refers to a two-player zero-sum repeated game between a learner and adversary.

At each time point:
- The learner chooses a distribution of predictions p_t over an action space X.
- The adversary chooses a reward function y_t : X -> R.

September 29, 2025 at 7:55 PM