pentagonalize.bsky.social
@pentagonalize.bsky.social
We present The Transformer Cookbook: a collection of recipes for programming algorithms directly into transformers!

Hungry for an induction head? Craving a Dyck language recognizer? We show you step-by-step how to cook up transformers for these algorithms and many more!
The Transformer Cookbook
We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a prob...
arxiv.org
October 3, 2025 at 4:24 PM
Reposted
New paper and two not-so-new papers on arXiv about transformer expressivity: (1) With @pentagonalize and Dana Angluin, "Simulating Hard Attention Using Soft Attention" arxiv.org/abs/2412.09925
Simulating Hard Attention Using Soft Attention
We study conditions under which transformers using soft attention can simulate hard attention, that is, effectively focus all attention on a subset of positions. First, we examine several variants of ...
arxiv.org
December 23, 2024 at 10:55 PM