Lightnews — Scholar-powered news

@dchiang.bsky.social

27 followers 40 following 26 posts

Posts Replies Media Videos

dchiang.bsky.social

@dchiang.bsky.social

Further, we show that deeper programs/formulas in C-RASP are strictly more expressive than shallower programs/formulas. Together, these results imply that in the above-defined variant, deeper transformers are strictly more expressive than shallower transformers.

June 23, 2025 at 11:56 AM

dchiang.bsky.social

@dchiang.bsky.social

New on arXiv: Knee-Deep in C-RASP, by @pentagonalize.bsky.social, @cadilhac.bsky.social, and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.

June 23, 2025 at 11:56 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news