dchiang.bsky.social
@dchiang.bsky.social
Reposted
Read the cookbook: arxiv.org/abs/2510.00368

Join us for weekly seminars on formal language theory, ML, NLP, and more: flannseminars.github.io
October 3, 2025 at 4:24 PM
Reposted
October 3, 2025 at 4:24 PM
Reposted
There is no better way to understand what transformers can do than to get your hands dirty and construct them, weight-by-weight. The Transformer Cookbook provides a guide for anyone aiming to understand the expressive power of transformers on such a formal level.
October 3, 2025 at 4:24 PM
Andy Yang @pentagonalize.bsky.social drove the conceptualization, theory, and experiments of this work. I was just the checker and editor!
June 23, 2025 at 6:50 PM
Although there is a lot of wiggle room in defining rounding/precision, our theoretical predictions are confirmed by experiments surprisingly well!
June 23, 2025 at 11:56 AM
The separating languages are very simple: L_k is the language of k blocks of one or more repetitions of a symbol, e.g., L_3 contains strings aba, aabbbbaaaaaa, etc. More blocks require more depth.
June 23, 2025 at 11:56 AM
Further, we show that deeper programs/formulas in C-RASP are strictly more expressive than shallower programs/formulas. Together, these results imply that in the above-defined variant, deeper transformers are strictly more expressive than shallower transformers.
June 23, 2025 at 11:56 AM
C-RASP is a programmer-friendly version of "temporal logic with future-masked counting." We show both are exactly equivalent to soft-attention transformers with fixed precision outside attention but no rounding inside attention (to avoid under/overflow summing over sequence).
June 23, 2025 at 11:56 AM
(Out of the papers that Aarohi @aarsri.bsky.social has published while at Notre Dame, 80% have received an award!)
April 23, 2025 at 1:30 PM
In contrast, on text with variation involving new words or meanings (e.g., "lie" vs. "cap"), far more data is needed, but it leads to a massive breakthrough in performance.
April 23, 2025 at 1:30 PM
On text with character-level variation (e.g., "strategy" vs. "strat"), out-of-the-box performance improves even with a few additional training examples -- but approaches a plateau, suggesting that more data is not the solution.
April 23, 2025 at 1:30 PM
If you're submitting an abstract to @colmweb.org, might as well submit it to MSLD too! nlp.nd.edu/msld25/
Midwest Speech and Language Days 2025
nlp.nd.edu
March 20, 2025 at 3:05 AM
Registration at Midwest Speech and Language Days is free, poster printing is free, and we will be able to provide free lodging to a limited number of students. nlp.nd.edu/msld25/
Midwest Speech and Language Days 2025
nlp.nd.edu
March 20, 2025 at 2:55 AM
The meeting will feature keynote addresses by
@mohitbansal.bsky.social, @davidrmortensen.bsky.social, Karen Livescu, and Heng Ji. Plus all of your great talks and posters! nlp.nd.edu/msld25
Midwest Speech and Language Days 2025
nlp.nd.edu
March 8, 2025 at 6:35 PM
December 23, 2024 at 10:56 PM
December 23, 2024 at 10:56 PM