Luisa Zintgraf
banner
luisazintgraf.bsky.social
Luisa Zintgraf
@luisazintgraf.bsky.social
RL & Meta-Learning @ DeepMind.
Huge shout-out to my co-first authors @dancalian.bsky.social, @gregfar.bsky.social, & Iurii Kemaev.

And to our amazing collaborators: Matteo Hessel, Jeremy Shar, Junhyuk Oh, András György, @schaul.bsky.social, @jeffdean.bsky.social, Hado van Hasselt, & Dave Silver.
November 6, 2025 at 11:29 AM
We believe that the DataRater is a promising step towards more automated and principled dataset curation. This could be especially important for filtering and making the best use of massive synthetic datasets in the future.

For a deeper dive, check out arxiv.org/pdf/2505.17895
November 6, 2025 at 11:29 AM
So what does the DataRater learn? It automatically identifies and down-weights data that aligns with human intuitions of low quality, such as incorrect text encodings, OCR errors, and irrelevant content.
November 6, 2025 at 11:29 AM
The result? The DataRater is highly effective at filtering data, leading to significant compute efficiency improvements. In our experiments, we observed up to a 46.6% net compute gain while often improving final model performance.
November 6, 2025 at 11:29 AM
We introduce the DataRater, a meta-learning method that learns to rate the value of each data point for training. Instead of manually specifying filtering rules, we train the DataRater to optimize for a simple goal: improving the training efficiency on a held-out dataset.
November 6, 2025 at 11:29 AM
Foundation models are trained on large datasets, but not all data is created equal. Dataset curation often relies on manual, coarse-grained filtering and hand-crafted rules. This is becoming a major challenge, especially with the rise of synthetic data.
November 6, 2025 at 11:29 AM
Tagging first author @jakeabeck.bsky.social who just joined bsky! Welcome 🎉
April 9, 2025 at 2:22 PM