Thomas Steinke
stein.ke
Thomas Steinke
@stein.ke
Computer science, math, machine learning, (differential) privacy
Researcher at Google DeepMind
Kiwi🇳🇿 in California🇺🇸
http://stein.ke/
IMHO, the best analog to the AI bubble is the dotcom bubble. Yes, the internet proved to be economically transformative, but there was still a bubble. Companies made a lot of money in the end, but it wasn't necessarily the ones that people expected -- e.g., see CISCO:
October 9, 2025 at 1:17 AM
🤦🤦🤦 this is not how two factor identification works 🤦🤦🤦🤦
August 13, 2025 at 2:29 AM
Doing linear algebra in finite fields is fun because numerical instability doesn't exist. Alas, library support is limited, so you may find yourself writing your own Gaussian elimination.
June 27, 2025 at 2:48 PM
Wait, what? 🤔
May 31, 2025 at 10:23 PM
What's the full acronym then? 🤔
May 22, 2025 at 8:19 PM
I'm a fan of the Jensen proof. It generalizes to prove Hölder's inequality:
May 14, 2025 at 6:06 PM
Three proofs of Cauchy-Schwarz.

⟨x,y⟩ ≤ ∥x∥ ∥y∥

Are there any others you know of?
May 14, 2025 at 6:06 PM
Suppose X,Y,Z,W are independent standard Gaussians.
Then X·Y+W·Z has a standard Laplace distribution.
Similarly, Z·√(X^2+Y^2) has a standard Laplace distribution
May 7, 2025 at 3:32 PM
As an application of this, we get to prove concentrated differential privacy for the restricted Gaussian mechanism.

E.g. if you have a bounded query and add Gaussian noise, you can condition the noisy output to also be bounded without any loss in privacy parameters. 😁
April 27, 2025 at 1:38 AM
Here's an application using the Log Sobolev Inequality for strongly log-concave distributions to bound KL divergence which can thus be converted to a bound on Rényi divergence.
April 27, 2025 at 1:36 AM
You can bound Rényi divergences in terms of KL divergences for tilted distributions. This is useful e.g. for Gaussians, where tilting just corresponds to shifting the distribution.
April 26, 2025 at 8:03 PM
There are also rod cells in your retina. In principle these give you a 4th dimension for perceiving colour. But they are for peripheral & night vision, so we don't perceive a 4th color dimension. 🤷
April 19, 2025 at 10:20 PM
Colours correspond to infinite-dimensional vectors, since there are infinitely many wavelengths of light.

But humans can only perceive a three-dimensional projection of colour (red, green, & blue).

What's interesting is that it's *not* an orthogonal projection. Here's a plot of the basis vectors.
April 19, 2025 at 10:12 PM
Taking α→1 gives a triangle inequality for KL divergence. This can also be proved using my favourite lemma. 😁
April 19, 2025 at 5:44 PM
Renyi divergences satisfy a triangle inequality (with an extra multiplier).
The proof boils down to Holder's inequality.
April 19, 2025 at 5:44 PM
Here's a very simple calculation showing that adding a bit of randomization can make numerical integration better even in the one-dimensional setting.
April 4, 2025 at 6:27 PM
Rather than a uniform bound (a.k.a. Kolmogorov–Smirnov distance), we can also get a universal multiplicative bound. This is tighter in the tails of the distribution.
March 29, 2025 at 5:08 PM
The DKW inequality states that, given i.i.d. samples from a univariate distribution, with high probability the empirical CDF is *uniformly* close to the true CDF.

The uniform guarantee is as tight as the pointwise guarantee.

(Alas I couldn't get this proof down to 1 page. 😅 )
March 29, 2025 at 5:08 PM
The quotient rule for higher derivatives is not as messy as I feared. 😅
And the matrix is lower-triangular, so it's easy to invert.
March 27, 2025 at 3:20 PM
L.J. Mordell effectively republished the formula in 1933. This paper is available. In it he laments that his 1920 paper is not well known.
doi.org/10.1007/BF02...
March 26, 2025 at 3:41 PM
This formula is known as the Mordell integral.
If you try looking up Mordell's original paper from 1920, you get nothing: 🙃
March 26, 2025 at 3:41 PM
I don't know how someone came up with this crazy formula for the mean of a logit-Normal, but I'm glad they did. It converges extremely fast.
en.wikipedia.org/wiki/Logit-n...
March 26, 2025 at 3:41 PM
Integration is to differentiation as NP is to P.
March 25, 2025 at 4:48 PM
Upper bounds like this are particularly useful, e.g., if you need to bound the expectation E[log(1+exp(X))].

This bound is a reformulation of Proposition 4.1 of arxiv.org/abs/1901.09188
or Equation 1.3 in
doi.org/10.1214/ECP....
March 24, 2025 at 4:26 PM
This is known as the Kearns-Saul inequality, which improves Hoeffding's lemma. It gives the optimal constant (independent of a) coefficient for the quadratic term.

It matches the Taylor series in constant & linear terms.

See how the upper bound compares to the 2nd-order Taylor series:
March 24, 2025 at 4:26 PM