jdrex.bsky.social
jdrex.bsky.social
@jdrex.bsky.social
I was reading aclanthology.org/2025.finding... earlier and it seems like it could be related - maybe duplicating blocks of layers instead of individual layers?
aclanthology.org
November 5, 2025 at 10:38 PM
do LLMs have to be transformers? I think of LLM as a description of the input/output paradigm (+ size, i guess), not the internal mechanisms
October 2, 2025 at 4:30 PM