nosharg.bsky.social
nosharg.bsky.social
@nosharg.bsky.social
haven’t thought about this before tho so could be way off base here
May 24, 2025 at 3:32 AM
i actually think this is a probabilistic argument: after something so OOD all conditional probabilities are extremely small, so it winds up sampling something weird which then collapses the subsequent distribution onto some random part of the training corpus
May 24, 2025 at 3:31 AM
agreed. mechanistically i would buy that the induction heads get overloaded but also its just so out of distribution that the model has to resort to digging for scraps
May 24, 2025 at 3:25 AM
a circle in each plane? the matrices are in an o(2) x … x o(2) subgroup of o(n) right
May 24, 2025 at 3:19 AM
its just a sequence-position-dependent rotation in latent space
May 24, 2025 at 2:58 AM
Reposted by nosharg.bsky.social
/pol/ Pot
February 9, 2025 at 1:21 AM