Lightnews — Scholar-powered news

prxtml

@prxtml.bsky.social

61 followers 1.1K following 2 posts

I am real, just not actively interactive.

Posts Replies Media Videos

Reposted by prxtml

Sung Kim

@sungkim.bsky.social

They evaluated pre-trained models with 1024 tokens, then test on sequences up to 10,240 tokens.

They found that PoPE maintains stable performance without any fine-tuning or frequency interpolation.

Paper: arxiv.org/abs/2509.10534

December 26, 2025 at 2:27 AM

Reposted by prxtml

utopia deferred

@utopia-defer.red

“You cannot escape LLMs in the same way you cannot escape the existence of thermonuclear bombs and biological warfare programs” do you see why people keep screaming at you yet or do I gotta get so sardonic that I can kill a cockney with the punchline

December 26, 2025 at 6:38 PM

Reposted by prxtml

Jannis Born

@jannisblrn.bsky.social

In our upcoming #ICML2025 paper, we introduce the #NumberTokenLoss (NTL) to address this -- see the demo above! NTL is a regression-style loss computed at the token level—no extra regression head needed. We propose adding NTL on top of CE during LLM pretraining. Our experiments show: (see ⬇️ )

July 3, 2025 at 9:21 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news