prxtml
prxtml.bsky.social
prxtml
@prxtml.bsky.social
I am real, just not actively interactive.
Reposted by prxtml
They evaluated pre-trained models with 1024 tokens, then test on sequences up to 10,240 tokens.

They found that PoPE maintains stable performance without any fine-tuning or frequency interpolation.

Paper: arxiv.org/abs/2509.10534
December 26, 2025 at 2:27 AM
Reposted by prxtml
“You cannot escape LLMs in the same way you cannot escape the existence of thermonuclear bombs and biological warfare programs” do you see why people keep screaming at you yet or do I gotta get so sardonic that I can kill a cockney with the punchline
December 26, 2025 at 6:38 PM
Reposted by prxtml
In our upcoming #ICML2025 paper, we introduce the #NumberTokenLoss (NTL) to address this -- see the demo above! NTL is a regression-style loss computed at the token level—no extra regression head needed. We propose adding NTL on top of CE during LLM pretraining. Our experiments show: (see ⬇️ )
July 3, 2025 at 9:21 PM