Lightnews — Scholar-powered news

changho.bsky.social

@changho.bsky.social

11/ We'd love to hear your thoughts and dive deeper into the discussion! 🚀
See you in Singapore at #ICLR2025!

Big thanks to my advisor
@fredsala.bsky.social for his guidance and to John for his contributions!

Paper: arxiv.org/abs/2412.03881
Github: github.com/SprocketLab/...

Weak-to-Strong Generalization Through the Data-Centric Lens

The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While dec...

arxiv.org

February 5, 2025 at 6:41 PM

changho.bsky.social

@changho.bsky.social

10/ By leveraging the strengths of simpler models and their understanding of easy patterns, stronger models can iteratively build upon this foundation to tackle increasingly complex challenges. We see this process as a systematic and scalable path toward achieving superintelligence.

February 5, 2025 at 6:40 PM

changho.bsky.social

@changho.bsky.social

9/ Looking ahead, we're excited to explore data-centric mechanisms for weak-to-strong generalization. Just as scholars refine theories—building on past insights to deepen understanding and create new concepts—we believe weak-to-strong generalization follows a similar trajectory.

February 5, 2025 at 6:40 PM

changho.bsky.social

@changho.bsky.social

8) Key takeaways for practitioners:
- Instead of just improving algorithms, focus on selecting the right data!
- Prioritizing high-overlap data sources gives us better generalization.

February 5, 2025 at 6:39 PM

changho.bsky.social

@changho.bsky.social

7/ How can we maximize overlap density when choosing data sources?

We frame data selection as a bandit problem, using UCB to balance exploration and exploitation across datasets. This strategically identifies and prioritize sources with high overlap density, maximizing generalization.

February 5, 2025 at 6:39 PM

changho.bsky.social

@changho.bsky.social

6/ To tackle this, we propose an overlap detection algorithm that uncovers these points in real-world datasets and helps explain both the presence and absence of weak-to-strong generalization.

February 5, 2025 at 6:38 PM

changho.bsky.social

@changho.bsky.social

5/ However, identifying overlap density in real-world datasets is challenging. Overlapping points are latent!

February 5, 2025 at 6:38 PM

changho.bsky.social

@changho.bsky.social

4/ 🔑 Key insights:
- Weak models can make accurate pseudolabels based on easy patterns
- Strong models leverage these labels to generalize on hard patterns.
- More overlap → better generalization

February 5, 2025 at 6:38 PM

changho.bsky.social

@changho.bsky.social

3/ The amount of these overlaps, i.e., the proportion of points containing both the easy and hard patterns, is the core quantity determining how much weak-to-strong generalization we get.

February 5, 2025 at 6:34 PM

changho.bsky.social

@changho.bsky.social

2/ The intuition is simple: generalization tracks the data points containing both “easy” patterns (learnable by a weak model) and “challenging” patterns (only learnable by a stronger model), as with such points, weak predictions create signal to learn challenging patterns with stronger models.

February 5, 2025 at 6:25 PM

changho.bsky.social

@changho.bsky.social

1/ Weak-to-strong generalization, where a strong student model surpasses its weaker teacher model, is crucial for achieving 'superintelligence'. We propose a mechanism explaining when and why this happens.

February 5, 2025 at 6:22 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news