Lightnews — Scholar-powered news

Sergei Vassilvitskii

@vsergei.bsky.social

280 followers 140 following 4 posts

Algorithms, predictions, privacy.
https://theory.stanford.edu/~sergei/

Posts Replies Media Videos

Sergei Vassilvitskii

@vsergei.bsky.social

Synthetic Data is all the rage in LLM training, but why does it work? In arxiv.org/abs/2502.08924 we show how to analyze this question through the lens of boosting. Unlike boosting, however, our assumptions on the data and the learning method are inverted.

Escaping Collapse: The Strength of Weak Data for Large Language Model Training

Synthetically-generated data plays an increasingly larger role in training large language models. However, while synthetic data has been found to be useful, studies have also shown that without proper...

arxiv.org

February 14, 2025 at 1:48 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news