Lightnews — Scholar-powered news

Reposted

pentagonalize.bsky.social

@pentagonalize.bsky.social

Read the cookbook: arxiv.org/abs/2510.00368

Join us for weekly seminars on formal language theory, ML, NLP, and more: flannseminars.github.io

October 3, 2025 at 4:24 PM

Reposted

pentagonalize.bsky.social

@pentagonalize.bsky.social

Thanks to all the chefs: @ccwatson.bsky.social, @antonxue.bsky.social, @satwik77.bsky.social, @ll4r3n4.bsky.social, @lambdaviking.bsky.social, Emile Dos Santos Ferreira, @anejsvete.bsky.social, @dchiang.bsky.social

October 3, 2025 at 4:24 PM

Reposted

pentagonalize.bsky.social

@pentagonalize.bsky.social

There is no better way to understand what transformers can do than to get your hands dirty and construct them, weight-by-weight. The Transformer Cookbook provides a guide for anyone aiming to understand the expressive power of transformers on such a formal level.

October 3, 2025 at 4:24 PM

dchiang.bsky.social

@dchiang.bsky.social

Andy Yang @pentagonalize.bsky.social drove the conceptualization, theory, and experiments of this work. I was just the checker and editor!

June 23, 2025 at 6:50 PM

dchiang.bsky.social

@dchiang.bsky.social

Paper: arxiv.org/abs/2506.16055
Code: github.com/pentagonaliz...

Knee-Deep in C-RASP: A Transformer Depth Hierarchy

It has been observed that transformers with greater depth (that is, more layers) have more capabilities, but can we establish formally which capabilities are gained with greater depth? We answer this ...

arxiv.org

June 23, 2025 at 11:56 AM

dchiang.bsky.social

@dchiang.bsky.social

Although there is a lot of wiggle room in defining rounding/precision, our theoretical predictions are confirmed by experiments surprisingly well!

June 23, 2025 at 11:56 AM

dchiang.bsky.social

@dchiang.bsky.social

The separating languages are very simple: L_k is the language of k blocks of one or more repetitions of a symbol, e.g., L_3 contains strings aba, aabbbbaaaaaa, etc. More blocks require more depth.

June 23, 2025 at 11:56 AM

dchiang.bsky.social

@dchiang.bsky.social

Further, we show that deeper programs/formulas in C-RASP are strictly more expressive than shallower programs/formulas. Together, these results imply that in the above-defined variant, deeper transformers are strictly more expressive than shallower transformers.

June 23, 2025 at 11:56 AM

dchiang.bsky.social

@dchiang.bsky.social

C-RASP is a programmer-friendly version of "temporal logic with future-masked counting." We show both are exactly equivalent to soft-attention transformers with fixed precision outside attention but no rounding inside attention (to avoid under/overflow summing over sequence).

June 23, 2025 at 11:56 AM

dchiang.bsky.social

@dchiang.bsky.social

arxiv.org/abs/2404.07304

We're Calling an Intervention: Exploring Fundamental Hurdles in Adapting Language Models to Nonstandard Text

We present a suite of experiments that allow us to understand the underlying challenges of language model adaptation to nonstandard text. We do so by designing interventions that approximate core feat...

arxiv.org

April 23, 2025 at 1:31 PM

dchiang.bsky.social

@dchiang.bsky.social

(Out of the papers that Aarohi @aarsri.bsky.social has published while at Notre Dame, 80% have received an award!)

April 23, 2025 at 1:30 PM

dchiang.bsky.social

@dchiang.bsky.social

In contrast, on text with variation involving new words or meanings (e.g., "lie" vs. "cap"), far more data is needed, but it leads to a massive breakthrough in performance.

April 23, 2025 at 1:30 PM

dchiang.bsky.social

@dchiang.bsky.social

On text with character-level variation (e.g., "strategy" vs. "strat"), out-of-the-box performance improves even with a few additional training examples -- but approaches a plateau, suggesting that more data is not the solution.

April 23, 2025 at 1:30 PM

dchiang.bsky.social

@dchiang.bsky.social

If you're submitting an abstract to @colmweb.org, might as well submit it to MSLD too! nlp.nd.edu/msld25/

Midwest Speech and Language Days 2025

nlp.nd.edu

March 20, 2025 at 3:05 AM

dchiang.bsky.social

@dchiang.bsky.social

Registration at Midwest Speech and Language Days is free, poster printing is free, and we will be able to provide free lodging to a limited number of students. nlp.nd.edu/msld25/

Midwest Speech and Language Days 2025

nlp.nd.edu

March 20, 2025 at 2:55 AM

dchiang.bsky.social

@dchiang.bsky.social

The meeting will feature keynote addresses by
@mohitbansal.bsky.social, @davidrmortensen.bsky.social, Karen Livescu, and Heng Ji. Plus all of your great talks and posters! nlp.nd.edu/msld25

Midwest Speech and Language Days 2025

nlp.nd.edu

March 8, 2025 at 6:35 PM