Thom Volker
thomvolker.bsky.social
Thom Volker
@thomvolker.bsky.social
PhD Candidate in Statistics, Utrecht University

Creates fake data for a living.

thomvolker.github.io
November 14, 2025 at 10:30 AM
fml
November 5, 2025 at 11:56 PM
Anarchy!

(for a presentation, having to set "print = FALSE" all the time is annoying)
November 5, 2025 at 4:10 PM
“Have you thought about including…”
Sir, may I first introduce my team and myself?
October 22, 2025 at 12:01 PM
How cool is this?! Positron opens a HTML color picker when typing a HTML color in your script.

#RStats #dataviz
October 6, 2025 at 8:17 PM
Elsevier, wtf is this supposed to mean?
September 30, 2025 at 8:02 AM
The prediction intervals scale adaptively with the fraction of missing information, yielding wider intervals for cases with more severe missingness. The prediction intervals remain confidence valid, regardless of whether missingness occurs in train and/or test data (preprint on this coming up).
September 29, 2025 at 1:32 PM
Cool stuff!

Florian van Leeuwen and I implemented a prediction function in the #mice package that allows the incorporation of missing data uncertainty in a prediction interval.

The `predict_mi()` function is available in the current development version: github.com/amices/mice

#Rstats #statsky
September 29, 2025 at 1:32 PM
Had a blast cycling on Corse today!
September 9, 2025 at 7:48 PM
Been working on a tutorial on synthetic data for open science for @lmu-osc.bsky.social
A draft version is now up: lmu-osc.github.io/synthetic-da...

It covers model building, evaluating synthetic data utility with density ratio estimation, and disclosure risk.

Feedback is very welcome!
August 21, 2025 at 3:11 PM
TIL that you can also round to tens or hundreds, or larger numbers, by specifying a negative "digits" argument in round().

#Rstats
August 19, 2025 at 8:37 AM
Never not thinking about this Tumblr post in this "AI for everything"-era

"... should I have a computer that eats my dinner and fucks my wife?"
August 14, 2025 at 9:12 AM
It turns out that arrays are all you need.
August 9, 2025 at 8:07 AM
I'd say that adding this large constant effectively undoes the log-transformation but applies a linear transformation: log(y+a) ~ log a + y / k (k is some constant; assuming y is small relative to a). So, different scale but same fit. Basically @bbolker.bsky.social's 1st order Taylor approximation.
August 1, 2025 at 9:19 AM
Greetings from the Alps!
July 26, 2025 at 5:23 PM
Best part of going to a conference is that you can append a trip to the Dolomites to it
July 24, 2025 at 5:47 PM
Some time ago, generated synthetic data using GANs for some simple examples, to understand GANs a bit better. Turned out it's harder than I thought to tune these things appropriately. Anyway, I turned my struggles into a blog post that might interest some of you: thomvolker.github.io/blog/1407_ga...
July 14, 2025 at 9:36 PM
Took me only one day to get convinced of positron. How cool is it that TODO's in quarto can be linked to Github issues in a single click!

*sorry for typos and weird text, there are TODO's here for a reason
July 11, 2025 at 8:41 PM
I'm beginning to doubt training grok on a mixture of 4chan and Elon tweets was a good idea.
July 9, 2025 at 6:32 AM
I cannot be convinced there exists a better title than "ItJustAintDopeToDropTheSlope.pdf" (which is how the .pdf on OSF is called).
July 8, 2025 at 7:00 AM
My university mailbox moves them straight to the spam folder:
July 4, 2025 at 6:47 AM
This more than ever
July 1, 2025 at 10:32 PM
June 30, 2025 at 8:00 AM
MICE is also not supposed to work in this case:

From the "Flexible Imputation with Missing Data" book by Stef van Buuren (section 2.7 "When not to use multiple imputation")
June 26, 2025 at 11:28 AM
After two weeks, I'm finally done!

In this post, I explain different approaches for solving linear regression in R: directly, using QR, singular value and Cholesky decompositions, and do some benchmarking for comparison with in-built approaches.

thomvolker.github.io/blog/2506_re...
June 18, 2025 at 2:22 PM