Lightnews — Scholar-powered news

Hanlin Zhang

@hlzhang109.bsky.social

Dive in 📑: arxiv.org/abs/2506.16029

Blog Post 📝: zhentingqi.github.io/internal/pro...

Thread 🧵: x.com/_hanlin_zhan...

Work by Zhenting Qi, and the team Fan Nie, Alexandre Alahi, @jameszou.bsky.social, Himabindu Lakkaraju, Yilun Du, Eric Xing, @shamkakade.bsky.social

EvoLM: In Search of Lost Language Model Training Dynamics

Modern language model (LM) training has been divided into multiple stages, making it difficult for downstream developers to evaluate the impact of design choices made at each stage. We present EvoLM, ...

arxiv.org

July 2, 2025 at 8:05 PM

Hanlin Zhang

@hlzhang109.bsky.social

✅ Open-source everything — models, data, training, and evaluation pipeline

✅ Maintain the EvoLM model family with clear data provenance

✅ Support the community in extending this foundation for future LLM research

EvoLM: In Search of Lost Language Model Training Dynamics

Modern language model (LM) training has been divided into multiple stages, making it difficult for downstream developers to evaluate the impact of design choices made at each stage. We present EvoLM, ...

arxiv.org

July 2, 2025 at 8:05 PM

Hanlin Zhang

@hlzhang109.bsky.social

We seek to:

✅ Build a fully transparent and reproducible model suite for studying LM training

✅ Quantify how each training phase contributes to upstream cloze task performance and downstream generative task performance, considering both in-domain and out-of-domain settings

EvoLM: In Search of Lost Language Model Training Dynamics

Modern language model (LM) training has been divided into multiple stages, making it difficult for downstream developers to evaluate the impact of design choices made at each stage. We present EvoLM, ...

arxiv.org

July 2, 2025 at 8:05 PM

Hanlin Zhang

@hlzhang109.bsky.social

[4/4] Prompt injection can extract private datastore content—verbatim—from RAG:

– Black-box attack can leak 41% of a book with just 100 queries
– Vulnerability grows with model size and instruction tuning
– Mitigation: eliminate position bias (via PINE)+system prompts

(arxiv.org/abs/2402.17840)

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-I...

arxiv.org

April 23, 2025 at 1:35 AM

Hanlin Zhang

@hlzhang109.bsky.social

[3/4] LMs can suffer from position bias—they favor content based on where it appears. This can hurt reasoning and evaluation.
We introduce PINE, a training-free method that eliminates position bias via bidirectional attention+reordering docs by attention scores.
(arxiv.org/abs/2407.01100)

Eliminating Position Bias of Language Models: A Mechanistic Approach

Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpecte...

arxiv.org

April 23, 2025 at 1:35 AM

Hanlin Zhang

@hlzhang109.bsky.social

[2/4] Can LLMs self-improve by verifying their own outputs? This paper says yes—with a twist. The key lies in a measure: the Generation-Verification Gap (GV-Gap) that scales with pretraining FLOPs in a log-linear trend.
Oral @yus167.bsky.social 6A: Sat 26 Apr 4:18-4:30.
(arxiv.org/abs/2412.02674)

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Self-improvement is a mechanism in Large Language Model (LLM) pre-training, post-training and test-time inference. We explore a framework where the model verifies its own outputs, filters or reweights...

arxiv.org

April 23, 2025 at 1:35 AM

Hanlin Zhang

@hlzhang109.bsky.social

[1/4]
This work:
- Shows that CBS scales with data size, not model size
- Provides theory + empirical scaling laws
- Suggests more data → higher CBS → more efficient data-parallel
Learn more: x.com/_hanlin_zhan...
Poster at Hall 3 #376, Thu 24 Apr 10-12:30.

Hanlin Zhang on X: "Critical batch size is crucial for reducing the wall-clock time of large-scale training runs with data parallelism. We find that it depends primarily on data size. 🧵 [1/n] Paper 📑: https://t.co/LFAPtzRkD9 Blog 📝: https://t.co/tGhR6HDgnE" / X

Critical batch size is crucial for reducing the wall-clock time of large-scale training runs with data parallelism. We find that it depends primarily on data size. 🧵 [1/n] Paper 📑: https://t.co/LFAPtzRkD9 Blog 📝: https://t.co/tGhR6HDgnE

x.com

April 23, 2025 at 1:35 AM

Hanlin Zhang

@hlzhang109.bsky.social

[1/4] Modern large-scale LM training is limited not just by compute, but by data movement—a classic Von Neumann bottleneck (research.ibm.com/blog/why-von...).

Scaling batch size reduces optimization steps, but only up to a point—the Critical Batch Size (CBS).

How the von Neumann bottleneck is impeding AI computing

The von Neumann architecture, which separates compute and memory, is perfect for conventional computing. But it creates a data traffic jam for AI.

research.ibm.com

April 23, 2025 at 1:35 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news