Alex Shtoff
banner
alexshtf.bsky.social
Alex Shtoff
@alexshtf.bsky.social
Principal scientist @ TII
Visit my research blog at https://alexshtf.github.io
Pinned
After a short time here, it's time for a short intro thread.
I like things on the intersection between opimization, numerical analysis, ML, and software engineering. I have a blog, where I write about stuff I like:
alexshtf.github.io

You're welcome to add me to starter packs as you see fit.
Alex Shtoff
Blog on optimization, machine learning, and software development.
alexshtf.github.io
Reposted by Alex Shtoff
Nicely written blog post by David Eppstein on the Boyer–Moore (deterministic) streaming algorithm to find a majority element in a stream, and its extensions, first to the turnstile model, and then to frequency estimation (Misra–Gries).
11011110.github.io/blog/2025/05... via @theory.report
Turnstile majority
A famous algorithm of Boyer and Moore for the majority problem finds a majority element in a stream of elements while storing only two values, a single tenta...
11011110.github.io
May 6, 2025 at 1:30 PM
Reposted by Alex Shtoff
The Matrix Mortality Problem asks if a given set of square matrices can multiply to the zero matrix after a finite sequence of multiplications of elements. It is is undecidable for matrices of size 3x3 or larger. buff.ly/lLmvvlo
May 1, 2025 at 5:01 AM
Attending #ICLR2025?
Visit our poster!
A stochastic approach to the subset selection problem via mirror descent.
Today, 3pm, poster #336.
April 26, 2025 at 1:59 AM
A question to the #math people here. For differential equations there are spectral methods that find approximate solutions in the span of orthogonal bases. Is there a variant for difference equations, and bases of sequences? A good tutorial maybe?
April 12, 2025 at 6:42 AM
Reposted by Alex Shtoff
The Tarski-Seidenberg theorem in logical form states that the set of first-order formulas over the real numbers is closed under quantifier elimination. This means any formula with quantifiers can be converted into an equivalent quantifier-free formula. perso.univ-rennes1.fr/michel.coste...
April 1, 2025 at 5:00 AM
🚨New post🚨

@beenwrekt.bsky.social recently started a bit of noise with his post about nonexistence of overfitting, but he has a point. In this post we explore it using simple polynomial curve fitting, *without regularization*, using another interesting basis.

alexshtf.github.io/2025/03/27/F...
March 31, 2025 at 1:22 PM
Reposted by Alex Shtoff
On the Detection of Reviewer-Author Collusion Rings From Paper Bidding

Steven Jecmen, Nihar B Shah, Fei Fang, Leman Akoglu

Action editor: Laurent Charlin

https://openreview.net/forum?id=o58uy91V2V

#collusion #colluders #fraud
January 14, 2025 at 5:07 AM
Reposted by Alex Shtoff
Function Basis Encoding of Numerical Features in Factorization Machines

Alex Shtoff, Elie Abboud, Rotem Stram, Oren Somekh

Action editor: Andriy Mnih

https://openreview.net/forum?id=M4222IBHsh

#factorization #feature #features
January 11, 2025 at 3:07 PM
Reposted by Alex Shtoff
Videos of the CNRS optimization conference now online (in French):
- Claire Mathieu : www.youtube.com/watch?v=_ZXZ...
- Gabriel Peyré : www.youtube.com/watch?v=vQOF...
- Jérôme Bolte : www.youtube.com/watch?v=tjkg...
- Axel Parmentier : www.youtube.com/watch?v=DohO...

Enjoy 🙂
www.youtube.com
January 7, 2025 at 6:29 AM
Fellow AI researchers. Please watch this video. Freya raises a valid concern about the sheer abuse of GenAI on the web, and the damage it does.
live in about 10 hours from now - subscribe/mark your calendars/tell your friends/share etc. c:

Generative AI is a Parasitic Cancer
www.youtube.com/watch?v=-opB...
Generative AI is a Parasitic Cancer
YouTube video by Freya Holmér
www.youtube.com
January 4, 2025 at 6:31 PM
[1/4] When working on ads at Yahoo, we had several 'ad hoc' solutions for various problems, and one of them was exponential moving average (EMA) of observations y₁,y₂...:
xᵢ₊₁=(1 - α)xᵢ+αyᵢ
One of the most overlooked facts is that it is actually online gradient descent!
January 4, 2025 at 1:47 PM
🚀 New Paper 🚀

This post is about our recent TMLR paper, "Function Basis Encoding of Numerical Features in Factorization Machines", by Alex Shtoff, Elie Abboud, Rotem Stram, and Oren Somekh.

Paper: openreview.net/forum?id=M42...
Code: github.com/alexshtf/con...
January 1, 2025 at 9:19 AM
ICLR area chairs reviewing papers
December 30, 2024 at 5:58 PM
Reposted by Alex Shtoff
Your Classifier Can Be Secretly a Likelihood-Based OOD Detector

Jirayu Burapacheep, Yixuan Li

Action editor: Changjian Shui

https://openreview.net/forum?id=FmA1JPWBM8

#classifiers #classifier #classification
December 26, 2024 at 3:06 PM
One of the overlooked properties of the proximal operator is the formula for composing with a semi-orthogonal matrix:
h(x) = p(A x + b), with A Aᵀ = α I

It is [1]:
proxₜₕ(x) = x + t⁻¹ Aᵀ(proxₜₚ(A x) - A x)

[1] Combettes, Wajs. Signal Recovery by Proximal Forward-Backward Splitting.
December 15, 2024 at 1:04 PM
Help me here #ML/#LLM X please. I'm pretty sure someone already thought of the extremely simple idea of augmenting each attention layer with additional learnable auxiliary memory in the form of embeddings. Could you point me to papers?
December 13, 2024 at 1:32 PM
M-Estimation is all you need
Change my mind :)

www.jstor.org/stable/3087324
The Calculus of M-Estimation on JSTOR
Leonard A. Stefanski, Dennis D. Boos, The Calculus of M-Estimation, The American Statistician, Vol. 56, No. 1 (Feb., 2002), pp. 29-38
www.jstor.org
December 12, 2024 at 10:34 AM
Reposted by Alex Shtoff
exciting new work by my truly brilliant postdoc Eugenio Clerico on the optimality of coin-betting strategies for mean estimation!

for fans of: mean estimation, online learning with log loss, optimal portfolios, hypothesis testing with E-values, etc.

dig in:
arxiv.org/abs/2412.02640
On the optimality of coin-betting for mean estimation
Confidence sequences are sequences of confidence sets that adapt to incoming data while maintaining validity. Recent advances have introduced an algorithmic formulation for constructing some of the ti...
arxiv.org
December 4, 2024 at 8:13 AM
I just found out that many people in the industry say that logistic regression (sigmoid + BCE loss) is not a regression algorithm, but a classification algorithm. And that the name "Logistic **Regression**" is wrong...

How do you call a model for estimating the conditional mean?
December 1, 2024 at 9:04 AM
@bsky.app What's the excuse for blocking alpindale? Is ML research considered "trolling the community"?
November 28, 2024 at 5:41 PM
Beautiful!
November 26, 2024 at 8:30 PM
Reposted by Alex Shtoff
The PCP theorem, a jewel of theoretical computer science, establishes that any NP statement can be assessed by a randomized verifier who only checks a vanishing fraction of the proof (indeed, a constant # of characters!)

This has had incredible impact, most notably on how ML reviews are conducted
November 26, 2024 at 5:33 AM
Reposted by Alex Shtoff
We are organising the First International Conference on Probabilistic Numerics (ProbNum 2025) at EURECOM in southern France in Sep 2025. Topics: AI, ML, Stat, Sim, and Numerics. Reposts very much appreciated!

probnum25.github.io
November 17, 2024 at 7:06 AM
Everybody likes complaining about ICLR... but after a rebuttal phase, the authors of one of the papers I reviewed addressed my concerns well, revised the paper according to the reviews of the other reviewers, and I increased their score. At least here - the process worked :)
November 26, 2024 at 8:16 AM