queelius.bsky.social
@queelius.bsky.social
A bit off-topic, but the Gaussian tendency has me wondering--could this limit how well models handle rare or long-range dependencies in language? Maybe long-tailed distributions like could help here, though I’m not sure about the trade-offs.
November 23, 2024 at 1:00 AM