changho.bsky.social
@changho.bsky.social
Ph.D student at @WisconsinCS @UWMadison
11/ We'd love to hear your thoughts and dive deeper into the discussion! 🚀
See you in Singapore at #ICLR2025!

Big thanks to my advisor
@fredsala.bsky.social for his guidance and to John for his contributions!

Paper: arxiv.org/abs/2412.03881
Github: github.com/SprocketLab/...
Weak-to-Strong Generalization Through the Data-Centric Lens
The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While dec...
arxiv.org
February 5, 2025 at 6:41 PM
10/ By leveraging the strengths of simpler models and their understanding of easy patterns, stronger models can iteratively build upon this foundation to tackle increasingly complex challenges. We see this process as a systematic and scalable path toward achieving superintelligence.
February 5, 2025 at 6:40 PM
9/ Looking ahead, we're excited to explore data-centric mechanisms for weak-to-strong generalization. Just as scholars refine theories—building on past insights to deepen understanding and create new concepts—we believe weak-to-strong generalization follows a similar trajectory.
February 5, 2025 at 6:40 PM
8) Key takeaways for practitioners:
- Instead of just improving algorithms, focus on selecting the right data!
- Prioritizing high-overlap data sources gives us better generalization.
February 5, 2025 at 6:39 PM
7/ How can we maximize overlap density when choosing data sources?

We frame data selection as a bandit problem, using UCB to balance exploration and exploitation across datasets. This strategically identifies and prioritize sources with high overlap density, maximizing generalization.
February 5, 2025 at 6:39 PM
6/ To tackle this, we propose an overlap detection algorithm that uncovers these points in real-world datasets and helps explain both the presence and absence of weak-to-strong generalization.
February 5, 2025 at 6:38 PM
5/ However, identifying overlap density in real-world datasets is challenging. Overlapping points are latent!
February 5, 2025 at 6:38 PM
4/ 🔑 Key insights:
- Weak models can make accurate pseudolabels based on easy patterns
- Strong models leverage these labels to generalize on hard patterns.
- More overlap → better generalization
February 5, 2025 at 6:38 PM
3/ The amount of these overlaps, i.e., the proportion of points containing both the easy and hard patterns, is the core quantity determining how much weak-to-strong generalization we get.
February 5, 2025 at 6:34 PM
2/ The intuition is simple: generalization tracks the data points containing both “easy” patterns (learnable by a weak model) and “challenging” patterns (only learnable by a stronger model), as with such points, weak predictions create signal to learn challenging patterns with stronger models.
February 5, 2025 at 6:25 PM
1/ Weak-to-strong generalization, where a strong student model surpasses its weaker teacher model, is crucial for achieving 'superintelligence'. We propose a mechanism explaining when and why this happens.
February 5, 2025 at 6:22 PM