Lightnews — Scholar-powered news

Aki Vehtari

@avehtari.bsky.social

6.2K followers 250 following 230 posts

Professor in computational Bayesian modeling at Aalto University, Finland. Bayesian Data Analysis 3rd ed, Regression and Other Stories, and Active Statistics co-author. #mcmc_stan and #arviz developer.

Web page https://users.aalto.fi/~ave/

Posts Replies Media Videos

Aki Vehtari

@avehtari.bsky.social

Sea view today close to my home

Photo of slightly foggy calm sea and islands

November 11, 2025 at 12:24 PM

Aki Vehtari

@avehtari.bsky.social

5 mins ago

October 29, 2025 at 6:08 PM

Aki Vehtari

@avehtari.bsky.social

October 21, 2025 at 3:58 PM

Aki Vehtari

@avehtari.bsky.social

I was two weeks on vacation in sunny and warm Sardinia, Italy, played in a beach ultimate tournament (this year we were the 4th best team in the world, and the best non-USA team), learned to kite surf, snorkeled, and ate lots of delicious food and gelato

Photo of many kite surfers at Punta Trettu, Sardinia

October 6, 2025 at 9:21 AM

Aki Vehtari

@avehtari.bsky.social

And the discrete rootogram with white background

September 3, 2025 at 12:10 PM

Aki Vehtari

@avehtari.bsky.social

bayesplot 1.14.0 CRAN release mc-stan.org/bayesplot/ with contributions from @tjmahr.com, Behram Ulukır, and @teemusailynoja.bsky.social

My favorite new feature is the discrete style ppc_rootogram() as proposed in teemusailynoja.github.io/visual-predi... and shown below

1/3

A discrete rootgram plot. The unmodified counts are displayed on the vertical axis, while using square root scaling for the axis, thus enabling direct reading of the count frequencies. The predictions are displayed as light blue points and interval lines. The observations are overlaid with darker points, except the observations outside the predictive credible intervals are highlighted in red.

September 3, 2025 at 12:09 PM

Aki Vehtari

@avehtari.bsky.social

We tested the accuracy of the MCSE with 41 posteriordb posteriors of varying complexity, plus with one Birthdays posterior. MCSE matches well the variation in repeated runs of MCMC and bridge sampling. Most of the variation in bridge sampling accuracy is explained by the number of dimensions.

posteriordb posteriors + Birthdays: The left plot shows estimated MCSE vs standard deviation of log marginal likelihood estimates from repeated MCMC runs. MCSE estimate has small bias except for a few posteriors with the highest variability.

posteriordb posteriors + Birthdays: The number of posterior dimensions vs standard deviation of log
marginal likelihood estimates from repeated MCMC runs. The estimate variability tends to increase with the
number of dimensions, which is natural due to the curse of dimensionality. In addition, there is variation depending
on how non-normal the posterior distribution is.

August 21, 2025 at 5:04 PM

Aki Vehtari

@avehtari.bsky.social

For categorical and ordinal data a series of calibration plots can be used. The plots below show one of these calibration plots for Model 1 and Model 2 (the same as in the first post in this thread). The red line going most time outside the blue envelope indicates that Model 1 is misspecifed. 3/4

PAV-adjusted calibration plots for models 1 and 1

August 13, 2025 at 2:34 PM

Aki Vehtari

@avehtari.bsky.social

Instead of PPC bar graphs, it is better to look at the calibration of the predictive probabilities with binned calibration plots or even better with PAV-adjusted calibration plot. 2/4

Binned and PAV-adjusted calibration plot

August 13, 2025 at 2:34 PM

Aki Vehtari

@avehtari.bsky.social

Posterior predictive checking of binary, categorical and many ordinal models with bar graphs is useless. Even the simplest models without covariates usually have such intercept terms that category specific probabilities are learned perfectly. Can you guess which model, 1 or 2, is misspecifed? 1/4

Useless posterior predictive checking bar graphs for Models 1 and 2

August 13, 2025 at 2:34 PM

Aki Vehtari

@avehtari.bsky.social

It's sometimes difficult to get the focus needed for book writing, but this place was perfect for me

A man sitting on a chair and writing with a laptop next to a Finnish lake on a sunny summer day

August 4, 2025 at 11:20 AM

Aki Vehtari

@avehtari.bsky.social

Based on this photo from 1920's at Helsinki University of Technology (which was later merged to Aalto University), they were also teaching how to draw an owl! (cc @rmcelreath.bsky.social)

Photo of an ornamental painting class at Helsinki University of Technology in the 1920s with a stuffed owl on a table

July 31, 2025 at 10:48 AM

Aki Vehtari

@avehtari.bsky.social

A new revised version of "Uncertainty in Bayesian leave-one-out cross-validation based model comparison" with Sivula, @mansmag.bsky.social, and Matamoros. We have clarified the goal of the paper, made more clear that the uncertainty is described by the posterior of unknown elpd difference, 1/4

June 23, 2025 at 7:11 AM

Aki Vehtari

@avehtari.bsky.social

The best gelato in Finland

June 4, 2025 at 12:37 PM

Aki Vehtari

@avehtari.bsky.social

In the morning I gave a talk about Bayesian cross-validation at KU Leuven and in the afternoon got to wear Belgian academic gown and hear David Spiegelhalter's honorate doctorate talk, which was great

May 28, 2025 at 3:49 PM

Aki Vehtari

@avehtari.bsky.social

We went to Mordor and all we got were flowers and ice cream.

Bayesian workflow group was a runner-up in Aalto Open Science Award 2024. The current and past group members running-up in alphabetical order: Alejandro Catalina, Anna Riha, Asael Alonzo Matamoros, David Kohns, ...

Photo of four persons in front of a sign saying "Mordor". One of the persons is holding flowers and another one is holding an ice cream.

May 20, 2025 at 12:29 PM

Aki Vehtari

@avehtari.bsky.social

I worked part of the afternoon outside

Photo of Finnish beach view on a sunny day

May 16, 2025 at 2:54 PM

Aki Vehtari

@avehtari.bsky.social

I'll talk about Bayesian workflow Thu 24th April 11-12 CEST in Learn Bayes seminar by Karolinska Institutet @ki.se learnbayes.se/events/bayes... (zoom available)

The focus will be different to my previous workflow talks (see users.aalto.fi/~ave/videos....). This time more flowcharts and shortcuts

A flowchart. The first box has text "Build a model" and arrow to the second box. The second box has text "Model and prior checking" and arrows to the third and fourth box. The arrow with text "not good" goes to the third box, which has text "Refine model" and an arrow back to the second box. The arrow with text "good" goes to the fourth box, which has text "Use the model".

April 23, 2025 at 10:18 AM

Aki Vehtari

@avehtari.bsky.social

A new paper with Alex Cooper and Catherine Forbes "Joint leave-group-out cross-validation in Bayesian spatial models" arxiv.org/abs/2504.15586

(Alex did the hard work for this, and running many cross-validation simulations with spatial models is hard)

Abstract: Cross-validation (CV) is a widely-used method of predictive assessment based on repeated model fits to different subsets of the available data. CV is applicable in a wide range of statistical settings. However, in cases where data are not exchangeable, the design of CV schemes should account for suspected correlation structures within the data. CV scheme designs include the selection of left-out blocks and the choice of scoring function for evaluating predictive performance. This paper focuses on the impact of two scoring strategies for block-wise CV applied to spatial models with Gaussian covariance structures. We investigate, through several experiments, whether evaluating the predictive performance of blocks of left-out observations jointly, rather than aggregating individual (pointwise) predictions, improves model selection performance. Extending recent findings for data with serial correlation (such as time-series data), our experiments suggest that joint scoring reduces the variability of CV estimates, leading to more reliable model selection, particularly when spatial dependence is strong and model differences are subtle.

April 23, 2025 at 8:22 AM

Aki Vehtari

@avehtari.bsky.social

I'm reading a few papers that use notation $\angle\{F|X_1,X_2,\dots,X_n\}$, where F is a distribution function and X are random variables. What does the angle symbol denote? I've not been able to find it with search engines (and one LLM says the most likely explanation is typo)

April 14, 2025 at 4:49 PM

Aki Vehtari

@avehtari.bsky.social

The fast method gives a biased total estimate. The difference estimator corrects the bias using some slow to compute estimates. In the case study, N=407 or N=657, we get close to full brute-force LOO-CV accuracy using subsampling LOO-CV with M=50, which means 8 to 13 faster computation. 7/

The plot shows the relevance order of the predictors and estimated predictive performance given those variables. The order is the same as in the previous plot, but now the predictive performance estimates are taking into account search and have smaller bias. It seems using just four predictors can provide the similar predictive performance as using all the predictors.

March 14, 2025 at 10:33 AM

Aki Vehtari

@avehtari.bsky.social

With subsampling LOO-CV proceedings.mlr.press/v108/magnuss... using the difference estimator we can combine the fast PSIS-LOO-CV performance estimates and M

Recently, new methods for model assessment, based on subsampling and posterior approximations, have been proposed for scaling leave-one-out cross-validation (LOO-CV) to large datasets. Although these methods work well for estimating predictive performance for individual models, they are less powerful in model comparison. We propose an efficient method for estimating differences in predictive performance by combining fast approximate LOO surrogates with exact LOO sub-sampling using the difference estimator and supply proofs with regards to scaling characteristics. The resulting approach can be orders of magnitude more efficient than previous approaches, as well as being better suited to model comparison.

March 14, 2025 at 10:33 AM

Aki Vehtari

@avehtari.bsky.social

Using Pareto smoothed importance sampling (PSIS) LOO-CV we can get very fast predictive performance estimate along the forward selection search path, but that approach is not cross-validating the search itself, and thus gives slightly optimistic estimates. 3/

The plot shows the relevance order of the predictors and estimated predictive performance given those variables. As the search can overfit and we didn’t cross-validate the search, the performance estimates can go above the reference model performance. However, this plot helps as to see that 10 or fewer predictors would be sufficient.

March 14, 2025 at 10:33 AM

Aki Vehtari

@avehtari.bsky.social

For a projpred introduction, see doi.org/10.1214/24-S...

Use of reference model and projection already reduces the variance in the model selection criterion that the amount of overfitting in forward selection of covariates is much smaller than with other approaches. 2/

Abstract
The concepts of Bayesian prediction, model comparison, and model selection have developed significantly over the last decade. As a result, the Bayesian community has witnessed a rapid growth in theoretical and applied contributions to building and selecting predictive models. Projection predictive inference in particular has shown promise to this end, finding application across a broad range of fields. It is less prone to over-fitting than naïve selection based purely on cross-validation or information criteria performance metrics, and has been known to out-perform other methods in terms of predictive performance. We survey the core concept and contemporary contributions to projection predictive inference, and present a safe, efficient, and modular workflow for prediction-oriented model selection therein. We also provide an interpretation of the projected posteriors achieved by projection predictive inference in terms of their limitations in causal settings.

March 14, 2025 at 10:33 AM

Aki Vehtari

@avehtari.bsky.social

In case of more than two categories, we can look at the calibration plots for one-vs-others, and in case of ordinal data we can look at the calibration of cumulative probabilities

Figure 26: Comparison between binned calibration plots and PAV-adjusted calibration plots for one versus others comparisons. We see that the predictions are well calibrated for class A, but the model seems to be confusing observations of classes B and C. The sharp rises at the extreme predictions indicate that the model is over-confident in its predicted probabilities for these classes.

March 4, 2025 at 1:15 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news