Alexander Terenin - on the faculty job market
banner
avt.im
Alexander Terenin - on the faculty job market
@avt.im
Decision-making under uncertainty, machine learning theory, artificial intelligence · anti-ideological · Assistant Research Professor, Cornell

https://avt.im/ · https://scholar.google.com/citations?user=EGKYdiwAAAAJ&sortby=pubdate
Check out the work at: arxiv.org/abs/2502.14790

And, again, shoutout to amazing coauthor Jeff Negrea! Working together has been a great pleasure!

Stay tuned for follow-up: we've been working on using this viewpoint to understand other correlated perturbation-based algorithms.
Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces
We develop a form Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future...
arxiv.org
September 29, 2025 at 7:55 PM
So this sums up the work! If you followed along, thanks for the interest!

I think you'd agree that "Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces" is a much better title than before. The old one was much harder to pronounce.
September 29, 2025 at 7:55 PM
The Bayesian viewpoint proves useful for developing this analysis.

It allows us to guess what a good prior will be, and suggests ways to use probability as a tool to prove the algorithm works.
September 29, 2025 at 7:55 PM
We prove that the Bayesian approach works in this setting too.

To achieve this, we develop a new probabilistic analysis of correlated Gaussian follow-the-perturbed-leader algorithms, of which ours is a special case.

This has been an open challenge in the area.
September 29, 2025 at 7:55 PM
The second one is where X = [0,1]^d and Y is the space of bounded Lipschitz functions.

Here, you can't use a prior with independence across actions. You need to share information between actions.

We do this by using a Gaussian process, with correlations between actions.
September 29, 2025 at 7:55 PM
The first one is the classical discrete setting where standard algorithms such as exponential weights are studied.

You can use a Gaussian prior which is independent across actions.
September 29, 2025 at 7:55 PM
Okay, so we now know what "Bayesian Algorithms for Adversarial Online Learning" are.

What about "from Finite to Infinite Action Spaces"?

This covers the two settings we show the aforementioned results in.
September 29, 2025 at 7:55 PM
This approach appears to not make any sense: the Bayesian model is completely fake.

We're pretending to know a distribution for how the adversary will act in the future.

But, in reality, they can do anything.

And yet... we show that this works!
September 29, 2025 at 7:55 PM
We show that this game secretly has a natural Bayesian strategy - one we show is strong.

What's the strategy?

It's really simple:
- Place a prior distribution of what the adversary will do in the future
- Condition on what the adversary has done
- Sample from the posterior
September 29, 2025 at 7:55 PM
One observation about adversarial online learning is that it appears to have nothing to do with Bayesian learning.

There is a two-player zero-sum game, not a joint probability distribution.

So you can't just solve it by applying Bayes' Rule. Or can you?
September 29, 2025 at 7:55 PM
Okay, so now we understand what "Adversarial Online Learning" is.

We propose "Bayesian Algorithms" for this.

What does that mean? Let's unpack.
September 29, 2025 at 7:55 PM
Online learning is therefore a good model for learning to explore by taking random actions.

In contrast to other approaches to resolving explore-exploit tradeoffs such as upper confidence bounds which produce purely deterministic strategies.
September 29, 2025 at 7:55 PM
So that's our setting. Why's it interesting?

Because many other hard decision problems can be reduced to online learning, including certain forms of reinforcement learning (via decision-estimation coefficients), equilibrium computation (via no-regret dynamics), and others.
September 29, 2025 at 7:55 PM
Specifically, their goal is to minimize regret

R(p,q) = E_{x_t~p_t, y_t~q_t} \sup_{x\in X} \sum_{t=1}^T y_t(x) - \sum_{t=1}^T y_t(x_t).

Meaning, the learner compares the sum of their rewards y_t(x_t) with the sum of y_t(x) for the best possible single non-time-dependent x.
September 29, 2025 at 7:55 PM
The learner's goal is to achieve the highest rewards possible. But at each time, the adversary can choose a different reward function.

So why is this game not impossible?

Because the learner only compares how well they do with the *sum* of the adversary's previous rewards.
September 29, 2025 at 7:55 PM
"Adversarial Online Learning" refers to a two-player zero-sum repeated game between a learner and adversary.

At each time point:
- The learner chooses a distribution of predictions p_t over an action space X.
- The adversary chooses a reward function y_t : X -> R.
September 29, 2025 at 7:55 PM
First a link: arxiv.org/abs/2502.14790

Now, let's unpack the new title!

Let's start with what we mean by "Adversarial Online Learning".
Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces
We develop a form Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future...
arxiv.org
September 29, 2025 at 7:55 PM
Project page: geospsr.github.io
Paper link: arxiv.org/abs/2503.19136
Link to my student's tweets on this work: x.com/sholalkere/s...
Stochastic Poisson Surface Reconstruction with One Solve using Geometric Gaussian Processes
geospsr.github.io
July 17, 2025 at 6:24 PM
Check out all of this season's seminars here: gp-seminar-series.github.io
Virtual Seminar Series on Bayesian Decision-making and Uncertainty
gp-seminar-series.github.io
June 2, 2025 at 4:06 PM
If you're interested in this work, check out the updated preprint! It's got a lot of new stuff!

Link: arxiv.org/abs/2502.14790
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces
We develop a form Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future...
arxiv.org
May 30, 2025 at 5:42 PM