Lightnews — Scholar-powered news

Thomas Steinke

@stein.ke

IMHO, the best analog to the AI bubble is the dotcom bubble. Yes, the internet proved to be economically transformative, but there was still a bubble. Companies made a lot of money in the end, but it wasn't necessarily the ones that people expected -- e.g., see CISCO:

Plot of CISCO's stock price from 1990-2025 showing huge growth around 2000 before dropping and slowly growing again.

October 9, 2025 at 1:17 AM

Thomas Steinke

@stein.ke

🤦🤦🤦 this is not how two factor identification works 🤦🤦🤦🤦

[Screenshot of an app]
Code Verification
Call customer service at 1-866-4220306 (outside the U.S. call 1-210-677-0065) to retrieve your One-time identification Code .

August 13, 2025 at 2:29 AM

Thomas Steinke

@stein.ke

Doing linear algebra in finite fields is fun because numerical instability doesn't exist. Alas, library support is limited, so you may find yourself writing your own Gaussian elimination.

def modinv(a, p):
"""Computes inverse of a modulo p i.e. returns b such that (a*b)%p==1"""
u, v = a % p, p
x, y, w, z = 1, 0, 0, 1 # maintain ax+py==u and aw+pz==v
while u != 0: # extended Euclidean algorithm
q, r = v // u, v % u
x, y, w, z = w-qx, z-qy, x, y
u, v = r, u
if v != 1: return None # not invertible
return w % p

def modmatrixinverse(A, p):
"""Computes matrix inverse of A modulo p i.e. returns B s.t. A@B%p==I"""
A = numpy.array(A, dtype=int)
m, n = A.shape
assert m == n, "matrix must be square"
B = numpy.eye(n, dtype=int) # = nxn identity matrix
for i in range(n): # Gaussian elimination
b, j = None, i-1
while b is None:
j += 1
if j >= n: return None # not invertible
b = modinv(A[j,i], p)
if j > i: # swap rows i and j
A[j,:], A[i,:] = A[i,:], A[j,:]
B[j,:], B[i,:] = B[i,:], B[j,:]
A[i,:] = (A[i,:] * b) % p # divide A[i,:] by A[i,i]
B[i,:] = (B[i,:] * b) % p
for j in range(n): # zero out the rest of column i
if j != i:
B[j,:] = (B[j,:] - B[i,:] * A[j,i]) % p
A[j,:] = (A[j,:] - A[i,:] * A[j,i]) % p
return B

June 27, 2025 at 2:48 PM

Thomas Steinke

@stein.ke

Wait, what? 🤔

Photo of a sign saying
"
No Pets
Shoes, Shirt
Required
"
And pictures of a footprint, shirt, and dog all crossed out

May 31, 2025 at 10:23 PM

Thomas Steinke

@stein.ke

What's the full acronym then? 🤔

Picture of an Amazon delivery truck with the following written the side.

The 'P' in ASAP stands for Prime.

May 22, 2025 at 8:19 PM

Thomas Steinke

@stein.ke

I'm a fan of the Jensen proof. It generalizes to prove Hölder's inequality:

$\textbf{H\"older's inequality from Jensen's inequality:} \noindent Let $x,y\in\mathbb{R}^d$ be arbitrary. Let $p,q \in (1,\infty)$ satisfy $\frac1p+\frac1q=1$. Without loss of generality, assume $x_i \ne 0$ for all $i \in [d]$. Note that $v \mapsto |v|^p$ is convex. For $i \in [d]$, define $u_i = |x_i|^q/\|x\|_q^q$ and $v_i = x_iy_i/|x_i|^q$. By Jensen's inequality, \[\left|\frac{\sum_i x_i y_i}{\|x\|_q^q}\right|^p = \left|\sum_i u_i v_i\right|^p \le \sum_i u_i |v_i|^p = \frac{\sum_i |x_i|^{q+p-pq}|y_i|^p}{\|x\|_q^q} = \frac{\|y\|_p^p}{\|x\|_q^q}, \] which rearranges to \[\left|\sum_i x_i y_i\right| \le \|x\|_q \cdot \|y\|_p.\]$

May 14, 2025 at 6:06 PM

Thomas Steinke

@stein.ke

Three proofs of Cauchy-Schwarz.

⟨x,y⟩ ≤ ∥x∥ ∥y∥

Are there any others you know of?

$\textbf{Fact:} $\forall u, v \ge 0 ~~~ \inf_{t>0} t \cdot u + \frac1t \cdot v = 2\sqrt{uv}$ \begin{proof} Assume $uv>0$; otherwise the proof is trivial. Let $f(t) = t \cdot u + \frac1t \cdot v$. Then $f'(t) = u-v/t^2$. Now $f'(t)=0 \iff t = \sqrt{v/u}$. Thus $\inf_{t>0} f(t) = f(\sqrt{v/u}) = 2\sqrt{uv}$. \end{proof} Let $x,y\in\mathbb{R}^d$. Then \begin{align*} \sum_i x_i y_i &\le \frac12 \sum_i 2\sqrt{x_i^2 \cdot y_i^2} \tag{absolute value} \\ &= \frac12 \sum_i \inf_{t_i>0} t_i \cdot x_i^2 + \frac{1}{t_i} \cdot y_i^2 \\ &\le \frac12 \inf_{t>0} \sum_i t \cdot x_i^2 + \frac1t \cdot y_i^2\\ &= \frac12 \inf_{t>0} t \cdot \|x\|^2 + \frac1t \cdot \|y\|^2\\ &= \sqrt{ \|x\|^2 \cdot \|y\|^2 } = \|x\| \cdot \|y\|. \end{align*}$ $\textbf{Jensen's inequality:} \noindent Let $x,y\in\mathbb{R}^d$ be arbitrary. Without loss of generality, assume $x_i \ne 0$ for all $i \in [d]$. For $i \in [d]$, define $p_i = x_i^2/\|x\|^2$ and $v_i = y_i/x_i$. By Jensen's inequality, \[\left(\frac{\sum_i x_i y_i}{\|x\|^2}\right)^2 = \left(\sum_i p_i v_i\right)^2 \le \sum_i p_i v_i^2 = \frac{\sum_i y_i^2}{\|x\|^2}, \] which rearranges to \[\sum_i x_i y_i \le \|x\| \cdot \|y\|.\]$ $\textbf{Sum of squares:} \noindent Let $x,y\in\mathbb{R}^d$. For all $\alpha,\beta>0$, \[0 \le \| \alpha x - \beta y \|^2 = \alpha^2\|x\|^2 + \beta^2\|y\|^2 -2\alpha\beta\langle x , y \rangle\] and, hence, \[\langle x , y \rangle \le \frac{\alpha}{2\beta}\|x\|^2 + \frac{\beta}{2\alpha}\|y\|^2.\] If $\|x\|=0$ or $\|y\|=0$, clearly $\langle x , y \rangle = 0$. Otherwise, set $\alpha=\|y\|$ and $\beta=\|x\|$ to obtain \[\langle x , y \rangle \le \frac{\|y\|}{2\|x\|}\|x\|^2 + \frac{\|x\|}{2\|y\|}\|y\|^2 = \|x\| \cdot \| y \|.\]$

May 14, 2025 at 6:06 PM

Thomas Steinke

@stein.ke

Suppose X,Y,Z,W are independent standard Gaussians.
Then X·Y+W·Z has a standard Laplace distribution.
Similarly, Z·√(X^2+Y^2) has a standard Laplace distribution

LaTeX source: https://pastebin.com/zL7rVVSN

May 7, 2025 at 3:32 PM

Thomas Steinke

@stein.ke

As an application of this, we get to prove concentrated differential privacy for the restricted Gaussian mechanism.

E.g. if you have a bounded query and add Gaussian noise, you can condition the noisy output to also be bounded without any loss in privacy parameters. 😁

LaTeX source for all 3 pages: https://pastebin.com/W6AtEwJ1

April 27, 2025 at 1:38 AM

Thomas Steinke

@stein.ke

Here's an application using the Log Sobolev Inequality for strongly log-concave distributions to bound KL divergence which can thus be converted to a bound on Rényi divergence.

April 27, 2025 at 1:36 AM

Thomas Steinke

@stein.ke

You can bound Rényi divergences in terms of KL divergences for tilted distributions. This is useful e.g. for Gaussians, where tilting just corresponds to shifting the distribution.

LaTeX source: https://pastebin.com/m9RwBcCi

April 26, 2025 at 8:03 PM

Thomas Steinke

@stein.ke

There are also rod cells in your retina. In principle these give you a 4th dimension for perceiving colour. But they are for peripheral & night vision, so we don't perceive a 4th color dimension. 🤷

Source: https://commons.wikimedia.org/wiki/File:Cone-absorbance-en.svg

April 19, 2025 at 10:20 PM

Thomas Steinke

@stein.ke

Colours correspond to infinite-dimensional vectors, since there are infinitely many wavelengths of light.

But humans can only perceive a three-dimensional projection of colour (red, green, & blue).

What's interesting is that it's *not* an orthogonal projection. Here's a plot of the basis vectors.

Source: https://commons.wikimedia.org/wiki/File:XYZ_color_matching_functions,_CIE_1931_and_Stockman_%26_Sharpe_2006.jpg

April 19, 2025 at 10:12 PM

Thomas Steinke

@stein.ke

Taking α→1 gives a triangle inequality for KL divergence. This can also be proved using my favourite lemma. 😁

$Full LaTeX source: https://pastebin.com/mA6KjUJs \begin{proposition}[Triangle-like inequality for KL divergence]\label{prop:kl-triangle} Let $P$, $R$, and $Q$ be probability distributions with $P$ being absolutely continuous with respect to $R$ and $R$ being absolutely conotinuous with respect to $Q$. Let $\kappa \in (1,\infty)$. Then \[ \dr{\text{KL}}{P}{Q} \le \frac{\kappa}{\kappa-1} \dr{\text{KL}}{P}{R} + \dr{\kappa}{R}{Q}, \] where $\dr{\text{KL}}{P}{Q} := \ex{X \gets P}{\log(P(X)/Q(X)}$ denotes the KL divergence and\\$\dr{\kappa}{R}{Q} = \frac{1}{\kappa-1} \log \ex{X \gets R}{(R(X)/Q(X))^{\kappa-1}}$ denotes the R\'enyi divergence of order $\kappa$. \end{proposition}$

April 19, 2025 at 5:44 PM

Thomas Steinke

@stein.ke

Renyi divergences satisfy a triangle inequality (with an extra multiplier).
The proof boils down to Holder's inequality.

$Screenshot of Lemma 5.2 and its proof from page 18 of https://arxiv.org/abs/1605.02065 Lemma 5.2 (Triangle-like Inequality for Renyi Divergence). Let P, Q, and R be probability distributions. Then D_a(P||Q) <= \frac{ka}{ka-1} D_{(ka-1)/(k-1)}(P||R) + D_{ka}(R||Q) for all k,a in (1,inf).$

April 19, 2025 at 5:44 PM

Thomas Steinke

@stein.ke

Here's a very simple calculation showing that adding a bit of randomization can make numerical integration better even in the one-dimensional setting.

LaTeX source: https://pastebin.com/1sGyGLBT

April 4, 2025 at 6:27 PM

Thomas Steinke

@stein.ke

Rather than a uniform bound (a.k.a. Kolmogorov–Smirnov distance), we can also get a universal multiplicative bound. This is tighter in the tails of the distribution.

LaTeX source: https://pastebin.com/C1nT9sEt (page 3)

March 29, 2025 at 5:08 PM

Thomas Steinke

@stein.ke

The DKW inequality states that, given i.i.d. samples from a univariate distribution, with high probability the empirical CDF is *uniformly* close to the true CDF.

The uniform guarantee is as tight as the pointwise guarantee.

(Alas I couldn't get this proof down to 1 page. 😅 )

LaTeX source: https://pastebin.com/C1nT9sEt

March 29, 2025 at 5:08 PM

Thomas Steinke

@stein.ke

The quotient rule for higher derivatives is not as messy as I feared. 😅
And the matrix is lower-triangular, so it's easy to invert.

$LaTeX source: https://pastebin.com/qtw8DPh8 \begin{lemma}[Quotient Rule for Higher Derivatives] ~\\ Let $h(x) := \frac{f(x)}{g(x)}$ with $g(x) \ne 0$. Then, %$h'(x) = \frac{f'(x)g(x)-f(x)g'(x)}{g(x)^2}$. More generally, for all $n \in \mathbb{N}$, \[ \left(\begin{array}{c} h(x) \\ h'(x) \\ h''(x) \\ h'''(x) \\ \vdots \\ h^{(k)}(x) \\ \vdots \\ h^{(n-1)}(x) \end{array}\right) = \left(\begin{array}{ccccc} g(x) & 0 & 0 & \cdots & 0 \\ g'(x) & g(x) & 0 & \cdots & 0 \\ g''(x) & 2 g'(x) & g(x) & \cdots & 0 \\ g'''(x) & 3 g''(x) & 3 g'(x) & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & 0 \\ g^{(k)}(x) & {k \choose 1} g^{(k-1)}(x) & {k \choose 2} g^{(k-2)}(x) & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & 0 \\ g^{(n-1)}(x) & {n-1 \choose 1} g^{(n-2)}(x) & {n-1 \choose 2} g^{(n-3)}(x) & \cdots & g(x) \end{array}\right)^{-1} \left(\begin{array}{c} f(x) \\ f'(x) \\ f''(x) \\ f'''(x) \\ \vdots \\ f^{(k)}(x) \\ \vdots \\ f^{(n-1)}(x) \end{array}\right) . \] \end{lemma}$

March 27, 2025 at 3:20 PM

Thomas Steinke

@stein.ke

L.J. Mordell effectively republished the formula in 1933. This paper is available. In it he laments that his 1920 paper is not well known.
doi.org/10.1007/BF02...

Screenshot of old paper with following text

324 L.J. Mordell.
in finite terms when n is an integer. The general integral or particular cases
have also been considered by Lerch 1, Hardy ~, Ramanujan 3, van der Corput ~ and
myself. My results which included the complete evaluation of the general integral, were found in September 1918 and published in I92O in volume 48 of the
Quarterly Journal. The paper is not well known and has even escaped the
notice of the editors of the Fortschritte. Further, it is not easily accessible
outside of Great Britain. It seems in view of the interest aroused by Siegel's
paper that it might be desirable to give a more accessible and fuller account of
the iutegral, and the considerations leading to it and the results deduced from
it.

March 26, 2025 at 3:41 PM

Thomas Steinke

@stein.ke

This formula is known as the Mordell integral.
If you try looking up Mordell's original paper from 1920, you get nothing: 🙃

creenshot from Google Scholar with the followng text. But no link to the paper.

[CITATION] The value of the definite integral∫∞−∞ eat2+ bt ect+ d dt
LJ Mordell - Quarterly J. of Math, 1920
Save Cite Cited by 35 Related articles

March 26, 2025 at 3:41 PM

Thomas Steinke

@stein.ke

I don't know how someone came up with this crazy formula for the mean of a logit-Normal, but I'm glad they did. It converges extremely fast.
en.wikipedia.org/wiki/Logit-n...

$\begin{proposition}[Mean of Logit-Normal Distribution] For all $\mu\in\mathbb{R}$ and all $\sigma>0$, \begin{align*} &\ex{X \gets \mathcal{N}(\mu,\sigma^2)}{\frac{1}{1+e^{-X}}} = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty \frac{1}{1+e^{-(\mu+\sigma x)}} e^{-x^2/2} \mathrm{d}x \\ &= \frac12 + \frac{\sum_{n=1}^\infty e^{-\sigma^2n^2/2} \sinh(n\mu)\tanh(n\sigma^2/2) + \frac{2\pi}{\sigma^2} e^{-(2n-1)^2\pi^2/2\sigma^2} \frac{\sin((2n-1)\pi\mu/\sigma^2)}{\sinh((2n-1)\pi^2/\sigma^2)}}{1 + 2 \sum_{m=1}^\infty e^{-\sigma^2m^2/2} \cosh(m\mu)} \end{align*} \end{proposition}$

March 26, 2025 at 3:41 PM

Thomas Steinke

@stein.ke

Integration is to differentiation as NP is to P.

$\begin{theorem} For all $c>1$, \[\frac{1}{2\pi} \int_{-\pi}^{\pi} \frac{1}{c+\sin(\theta)} \mathrm{d} \theta = \frac{1}{\sqrt{c^2-1}}.\] \end{theorem} \begin{proof} Clearly \[\frac{\mathrm{d}}{\mathrm{d}\theta} \left[ \frac{2}{\sqrt{c^2-1}} \tan^{-1}\left(\frac{1 + c \tan (\theta/2)}{\sqrt{c^2-1}}\right) \right] = \frac{1}{c+\sin(\theta)}.\] The result now follows from the fundamental theorem of calculus. \end{proof}$

March 25, 2025 at 4:48 PM

Thomas Steinke

@stein.ke

Upper bounds like this are particularly useful, e.g., if you need to bound the expectation E[log(1+exp(X))].

This bound is a reformulation of Proposition 4.1 of arxiv.org/abs/1901.09188
or Equation 1.3 in
doi.org/10.1214/ECP....

Proposition 4.1 in
https://arxiv.org/pdf/1901.09188.pdf#page=12

March 24, 2025 at 4:26 PM

Thomas Steinke

@stein.ke

This is known as the Kearns-Saul inequality, which improves Hoeffding's lemma. It gives the optimal constant (independent of a) coefficient for the quadratic term.

It matches the Taylor series in constant & linear terms.

See how the upper bound compares to the 2nd-order Taylor series:

Graph showing log(1+exp(x)), second order taylor series at x=2, & corresponding quadratic upper bound.

Graph showing log(1+exp(x)), second order taylor series at x=-3, & corresponding quadratic upper bound.

Graph showing log(1+exp(x)), second order taylor series at x=1, & corresponding quadratic upper bound.

$Quadratic upper bound: \[\log(1+e^x) \le \log(1+e^a) + \frac{x-a}{1+e^{-a}} + \frac{(e^a-1) \cdot (x-a)^2}{4 \cdot a \cdot (e^a+1)}.\] Second-order Taylor series: \[\log(1+e^x) \approx \log(1+e^a) + \frac{x-a}{1+e^{-a}} + \frac{ (x-a)^2}{2 \cdot (e^a+2+e^{-a})}.\]$

March 24, 2025 at 4:26 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news