changho.bsky.social
@changho.bsky.social
Ph.D student at @WisconsinCS @UWMadison
7/ How can we maximize overlap density when choosing data sources?

We frame data selection as a bandit problem, using UCB to balance exploration and exploitation across datasets. This strategically identifies and prioritize sources with high overlap density, maximizing generalization.
February 5, 2025 at 6:39 PM
6/ To tackle this, we propose an overlap detection algorithm that uncovers these points in real-world datasets and helps explain both the presence and absence of weak-to-strong generalization.
February 5, 2025 at 6:38 PM
What enables a strong model to surpass its weaker teacher?

🚀 Excited to share our ICLR 2025 paper: "Weak-to-Strong Generalization Through the Data-Centric Lens"! 🧵
February 5, 2025 at 6:22 PM