Lightnews — Scholar-powered news

Kris Reyes

@csms.io

Or (and this may seem blasphemous to ML people), you can just use educated guesstimates of the parameters. This is, I would argue, more Bayesian, than MLE-based hyperparameter tuning, as they reflect prior knowledge of your system.

April 16, 2025 at 7:41 PM

Kris Reyes

@csms.io

Or you can using MAP estimates instead of likelihoods to incorporate prior knowledge to regularize the ill-posedness of the maximum likelihood calculation.

April 16, 2025 at 7:39 PM

Kris Reyes

@csms.io

What is the alternative to maximum-likelihood estimates to hyperparameters of a GP model? You can use hierarchical beliefs on these hyperparameters. This shifts the computational burden from likelihood optimization to "train" a model to methods such as MCMC to sample from the posterior distribution.

April 16, 2025 at 7:37 PM

Kris Reyes

@csms.io

Hyperparameters need to be set based on prior information or tuned to data (if you must) using empirical bayesian methods.

April 16, 2025 at 7:34 PM

Kris Reyes

@csms.io

Second is that when you're using GPs in a Bayesian context -- representing priors for an unknown function -- naively tuning hyperparameters to the prior based on data goes against the Bayesian philosophy.

April 16, 2025 at 7:34 PM

Kris Reyes

@csms.io

First, "training" the model, i.e. hyperparameter tuning by calculating maximum likelihood estimates is ill-posed:

www.jmlr.org/papers/v24/2...

This is especially magnified in low-data settings.

Maximum likelihood estimation in Gaussian process regression is ill-posed

www.jmlr.org

April 16, 2025 at 7:31 PM

Kris Reyes

@csms.io

So there is a disproportionate (IMO) amount of effort by both implementers of GP libraries and users dedicated to optimization of hyperparameters -- at least in the context of small-data settings. This is not great for a few reasons:

April 16, 2025 at 7:30 PM

Kris Reyes

@csms.io

That is, people who first work with GPs from an ML perspective look for parameters to optimize off of data, and this becomes their primary preoccupation. Desperate to fit into the ML perspective, they turn to the only "parameters" present in a GP, hyperparameters in mean and covariance functions.

April 16, 2025 at 6:30 PM