dchiang.bsky.social
@dchiang.bsky.social
Further, we show that deeper programs/formulas in C-RASP are strictly more expressive than shallower programs/formulas. Together, these results imply that in the above-defined variant, deeper transformers are strictly more expressive than shallower transformers.
June 23, 2025 at 11:56 AM
New on arXiv: Knee-Deep in C-RASP, by @pentagonalize.bsky.social, @cadilhac.bsky.social, and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.
June 23, 2025 at 11:56 AM