placeholder720.bsky.social
@placeholder720.bsky.social
Reposted
January 18, 2026 at 1:14 AM
"enriched" forward pass
November 17, 2025 at 6:03 PM
makes sense, models can totally smuggle information in the kv cache across token indices but if we suppose some intermediate computation is completely independent from the token we emit then this info can't participate in any of the more complex stuff a la arxiv.org/abs/2402.12875 - its just an
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetic...
arxiv.org
November 17, 2025 at 6:02 PM
If I am an expert in layer 16 of 32 of a vanilla transformer and realize that my job in at some token is to compute some sum so that it can be used down the line, I can do that, then any attention head in layer 16 can deposit that info to any future token without any intermediate unembeddings right?
November 17, 2025 at 5:41 PM
I bet heath ceramics would do it www.youtube.com/watch?v=v678...
Wrong Turn on the Dragon - Numberphile
YouTube video by Numberphile
www.youtube.com
November 8, 2025 at 8:07 PM
Interesting, what about muon/shampoo or other spectrum-y ones?
September 3, 2025 at 4:40 AM
Reposted
modern ai is basically a bunch of rogue google employees taking google projects that were done pretty cautiously and making them less cautious
August 23, 2025 at 4:41 PM
Some days the Iliad being about Helen just becomes a lot more believable.
August 6, 2025 at 10:53 PM
🎯
November 30, 2024 at 6:23 PM