Lightnews — Scholar-powered news

Cong Lu

@cong-ml.bsky.social

920 followers 460 following 11 posts

Research Scientist @ Google DeepMind, in open-ended learning, and AI for Scientific Discovery.

Posts Replies Media Videos

Cong Lu

@cong-ml.bsky.social

More major advantages! 🌟

COST-EFFECTIVE: StochasTok allows enhanced subword skills to be seamlessly 'retrofitted' into existing pretrained models - thus avoiding costly pretraining!
ENHANCED ROBUSTNESS: Improves resilience to alternative tokenizations! (see examples)

[6/]

June 11, 2025 at 12:09 PM

Cong Lu

@cong-ml.bsky.social

Empirically, we find:
LANGUAGE: As hoped, StochasTok unlocks language manipulation ability! (see task examples below)
MATH: Furthermore, StochasTok dramatically changes multi-digit addition, enabling grokking and even generalization to UNSEEN TOKENIZERS!🤯

[5/]

June 11, 2025 at 12:09 PM

Cong Lu

@cong-ml.bsky.social

The underlying StochasTok algorithm is extremely simple!

1️⃣ Simply tokenize text with ANY base tokenizer,
2️⃣ Then, stochastically split some of those tokens into equivalent token pairs.

That’s basically it! Repeat step 2 for the desired granularity.

[3/]

June 11, 2025 at 12:09 PM

Cong Lu

@cong-ml.bsky.social

🤔The problem: Standard tokenization gives distinct token IDs for each token - making it unnecessarily hard to learn, e.g., ‘book’=3092 and ‘cook’=171691 differ by a single letter.

🎉The solution: Allow LLMs to naturally 'see inside' tokens via alternative tokenizations!

[2/]

June 11, 2025 at 12:09 PM

Cong Lu

@cong-ml.bsky.social

🚀Introducing “StochasTok: Improving Fine-Grained Subword Understanding in LLMs”!🚀

LLMs are incredible but still struggle disproportionately with subword tasks, e.g., for character counts, wordplay, multi-digit numbers, fixing typos… Enter StochasTok, led by Anya Sims!

[1/]

June 11, 2025 at 12:09 PM

Cong Lu

@cong-ml.bsky.social

Interested in robust model-based offline RL algorithms? Come check out Anya Sims presenting our new paper investigating the edge of reach problem in offline MBRL!

📍East Exhibit Hall A-C #4603

#NeurIPS2024

December 12, 2024 at 12:34 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news