Lightnews — Scholar-powered news

Michael Hahn

@m-hahn.bsky.social

87 followers 250 following 8 posts

Prof at Saarland. NLP and machine learning.
Theory and interpretability of LLMs.
https://www.mhahn.info

Posts Replies Media Videos

Michael Hahn

@m-hahn.bsky.social

8/8 Big thanks to my co-authors Alireza, Xinting, and Mark! 🔗 Read the paper: arXiv.org/abs/2502.02393

Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers

Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. While theoretical results show that polynomial-length scratchpad...

arXiv.org

May 5, 2025 at 12:25 PM

Michael Hahn

@m-hahn.bsky.social

7/8 Takeaway: There’s no free lunch: long CoTs are information-theoretically necessary for these tasks. Any compression tricks will hit these hard limits unless we revamp the model architecture itself.

May 5, 2025 at 12:25 PM

Michael Hahn

@m-hahn.bsky.social

6/8 Graph reachability: Given a DAG and two nodes, is there a path? You might try to “cheat” with clever CoTs—but in fact you need Ω(|E| log |V|) steps, matching BFS’s runtime (plus indexing overhead).

May 5, 2025 at 12:25 PM

Michael Hahn

@m-hahn.bsky.social

5/8 Multiplication: LLMs notoriously flub middle digits of big-integer products. Those digits hinge on all input bits, so you get a linear CoT lower bound. Our best upper bound uses Schönhage–Strassen in O(n log n) steps— closing the log(n) gap is open.

May 5, 2025 at 12:25 PM

Michael Hahn

@m-hahn.bsky.social

4/8 Regular languages: Hard-attention Transformers without CoT sit in AC⁰. To recognize anything beyond AC⁰—e.g., Parity—you need at least linear-length CoTs. Sublinear won’t cut it.

May 5, 2025 at 12:25 PM

Michael Hahn

@m-hahn.bsky.social

3/8 Yes! We derive tight lower bounds for several algorithmic tasks. The theory assumes hard attention, but soft-attention experiments paint the same picture.

May 5, 2025 at 12:25 PM

Michael Hahn

@m-hahn.bsky.social

2/8 Prior work shows that CoT-equipped Transformers can simulate P-time computations via polynomial-length traces. But those broad results don’t pin down exactly how long you need to solve a specific problem. We ask: can we get fine-grained lower bounds on CoT length?

May 5, 2025 at 12:25 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news