Lightnews — Scholar-powered news

@pentagonalize.bsky.social

8 followers 19 following 4 posts

Posts Replies Media Videos

pentagonalize.bsky.social

@pentagonalize.bsky.social

We present The Transformer Cookbook: a collection of recipes for programming algorithms directly into transformers!

Hungry for an induction head? Craving a Dyck language recognizer? We show you step-by-step how to cook up transformers for these algorithms and many more!

The Transformer Cookbook

We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a prob...

arxiv.org

October 3, 2025 at 4:24 PM

Reposted

dchiang.bsky.social

@dchiang.bsky.social

New paper and two not-so-new papers on arXiv about transformer expressivity: (1) With @pentagonalize and Dana Angluin, "Simulating Hard Attention Using Soft Attention" arxiv.org/abs/2412.09925

Simulating Hard Attention Using Soft Attention

We study conditions under which transformers using soft attention can simulate hard attention, that is, effectively focus all attention on a subset of positions. First, we examine several variants of ...

arxiv.org

December 23, 2024 at 10:55 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news