https://hanlin-zhang.com
Blog Post 📝: zhentingqi.github.io/internal/pro...
Thread 🧵: x.com/_hanlin_zhan...
Work by Zhenting Qi, and the team Fan Nie, Alexandre Alahi, @jameszou.bsky.social, Himabindu Lakkaraju, Yilun Du, Eric Xing, @shamkakade.bsky.social
Blog Post 📝: zhentingqi.github.io/internal/pro...
Thread 🧵: x.com/_hanlin_zhan...
Work by Zhenting Qi, and the team Fan Nie, Alexandre Alahi, @jameszou.bsky.social, Himabindu Lakkaraju, Yilun Du, Eric Xing, @shamkakade.bsky.social
✅ Maintain the EvoLM model family with clear data provenance
✅ Support the community in extending this foundation for future LLM research
✅ Maintain the EvoLM model family with clear data provenance
✅ Support the community in extending this foundation for future LLM research
✅ Build a fully transparent and reproducible model suite for studying LM training
✅ Quantify how each training phase contributes to upstream cloze task performance and downstream generative task performance, considering both in-domain and out-of-domain settings
✅ Build a fully transparent and reproducible model suite for studying LM training
✅ Quantify how each training phase contributes to upstream cloze task performance and downstream generative task performance, considering both in-domain and out-of-domain settings
– Black-box attack can leak 41% of a book with just 100 queries
– Vulnerability grows with model size and instruction tuning
– Mitigation: eliminate position bias (via PINE)+system prompts
(arxiv.org/abs/2402.17840)
– Black-box attack can leak 41% of a book with just 100 queries
– Vulnerability grows with model size and instruction tuning
– Mitigation: eliminate position bias (via PINE)+system prompts
(arxiv.org/abs/2402.17840)
We introduce PINE, a training-free method that eliminates position bias via bidirectional attention+reordering docs by attention scores.
(arxiv.org/abs/2407.01100)
We introduce PINE, a training-free method that eliminates position bias via bidirectional attention+reordering docs by attention scores.
(arxiv.org/abs/2407.01100)
Oral @yus167.bsky.social 6A: Sat 26 Apr 4:18-4:30.
(arxiv.org/abs/2412.02674)
Oral @yus167.bsky.social 6A: Sat 26 Apr 4:18-4:30.
(arxiv.org/abs/2412.02674)
This work:
- Shows that CBS scales with data size, not model size
- Provides theory + empirical scaling laws
- Suggests more data → higher CBS → more efficient data-parallel
Learn more: x.com/_hanlin_zhan...
Poster at Hall 3 #376, Thu 24 Apr 10-12:30.
This work:
- Shows that CBS scales with data size, not model size
- Provides theory + empirical scaling laws
- Suggests more data → higher CBS → more efficient data-parallel
Learn more: x.com/_hanlin_zhan...
Poster at Hall 3 #376, Thu 24 Apr 10-12:30.
Scaling batch size reduces optimization steps, but only up to a point—the Critical Batch Size (CBS).
Scaling batch size reduces optimization steps, but only up to a point—the Critical Batch Size (CBS).