https://avt.im/ · https://scholar.google.com/citations?user=EGKYdiwAAAAJ&sortby=pubdate
And, again, shoutout to amazing coauthor Jeff Negrea! Working together has been a great pleasure!
Stay tuned for follow-up: we've been working on using this viewpoint to understand other correlated perturbation-based algorithms.
And, again, shoutout to amazing coauthor Jeff Negrea! Working together has been a great pleasure!
Stay tuned for follow-up: we've been working on using this viewpoint to understand other correlated perturbation-based algorithms.
I think you'd agree that "Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces" is a much better title than before. The old one was much harder to pronounce.
I think you'd agree that "Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces" is a much better title than before. The old one was much harder to pronounce.
It allows us to guess what a good prior will be, and suggests ways to use probability as a tool to prove the algorithm works.
It allows us to guess what a good prior will be, and suggests ways to use probability as a tool to prove the algorithm works.
To achieve this, we develop a new probabilistic analysis of correlated Gaussian follow-the-perturbed-leader algorithms, of which ours is a special case.
This has been an open challenge in the area.
To achieve this, we develop a new probabilistic analysis of correlated Gaussian follow-the-perturbed-leader algorithms, of which ours is a special case.
This has been an open challenge in the area.
Here, you can't use a prior with independence across actions. You need to share information between actions.
We do this by using a Gaussian process, with correlations between actions.
Here, you can't use a prior with independence across actions. You need to share information between actions.
We do this by using a Gaussian process, with correlations between actions.
You can use a Gaussian prior which is independent across actions.
You can use a Gaussian prior which is independent across actions.
What about "from Finite to Infinite Action Spaces"?
This covers the two settings we show the aforementioned results in.
What about "from Finite to Infinite Action Spaces"?
This covers the two settings we show the aforementioned results in.
We're pretending to know a distribution for how the adversary will act in the future.
But, in reality, they can do anything.
And yet... we show that this works!
We're pretending to know a distribution for how the adversary will act in the future.
But, in reality, they can do anything.
And yet... we show that this works!
What's the strategy?
It's really simple:
- Place a prior distribution of what the adversary will do in the future
- Condition on what the adversary has done
- Sample from the posterior
What's the strategy?
It's really simple:
- Place a prior distribution of what the adversary will do in the future
- Condition on what the adversary has done
- Sample from the posterior
There is a two-player zero-sum game, not a joint probability distribution.
So you can't just solve it by applying Bayes' Rule. Or can you?
There is a two-player zero-sum game, not a joint probability distribution.
So you can't just solve it by applying Bayes' Rule. Or can you?
We propose "Bayesian Algorithms" for this.
What does that mean? Let's unpack.
We propose "Bayesian Algorithms" for this.
What does that mean? Let's unpack.
In contrast to other approaches to resolving explore-exploit tradeoffs such as upper confidence bounds which produce purely deterministic strategies.
In contrast to other approaches to resolving explore-exploit tradeoffs such as upper confidence bounds which produce purely deterministic strategies.
Because many other hard decision problems can be reduced to online learning, including certain forms of reinforcement learning (via decision-estimation coefficients), equilibrium computation (via no-regret dynamics), and others.
Because many other hard decision problems can be reduced to online learning, including certain forms of reinforcement learning (via decision-estimation coefficients), equilibrium computation (via no-regret dynamics), and others.
R(p,q) = E_{x_t~p_t, y_t~q_t} \sup_{x\in X} \sum_{t=1}^T y_t(x) - \sum_{t=1}^T y_t(x_t).
Meaning, the learner compares the sum of their rewards y_t(x_t) with the sum of y_t(x) for the best possible single non-time-dependent x.
R(p,q) = E_{x_t~p_t, y_t~q_t} \sup_{x\in X} \sum_{t=1}^T y_t(x) - \sum_{t=1}^T y_t(x_t).
Meaning, the learner compares the sum of their rewards y_t(x_t) with the sum of y_t(x) for the best possible single non-time-dependent x.
So why is this game not impossible?
Because the learner only compares how well they do with the *sum* of the adversary's previous rewards.
So why is this game not impossible?
Because the learner only compares how well they do with the *sum* of the adversary's previous rewards.
At each time point:
- The learner chooses a distribution of predictions p_t over an action space X.
- The adversary chooses a reward function y_t : X -> R.
At each time point:
- The learner chooses a distribution of predictions p_t over an action space X.
- The adversary chooses a reward function y_t : X -> R.
Now, let's unpack the new title!
Let's start with what we mean by "Adversarial Online Learning".
Now, let's unpack the new title!
Let's start with what we mean by "Adversarial Online Learning".
Paper link: arxiv.org/abs/2503.19136
Link to my student's tweets on this work: x.com/sholalkere/s...
Paper link: arxiv.org/abs/2503.19136
Link to my student's tweets on this work: x.com/sholalkere/s...
Link: arxiv.org/abs/2502.14790
Link: arxiv.org/abs/2502.14790