Lightnews — Scholar-powered news

Charlie Snell

@seasnell.bsky.social

380 followers 320 following 8 posts

PhD @berkeley_ai; prev SR @GoogleDeepMind. I stare at my computer a lot and make things

Posts Replies Media Videos

Charlie Snell

@seasnell.bsky.social

All model checkpoints we used for this research are also available here: t.co/IlSmJ8Na1i

https://huggingface.co/openlm-research

t.co

November 26, 2024 at 10:37 PM

Charlie Snell

@seasnell.bsky.social

This was a fun project with Eric Wallace, Dan Klein, and Sergey Levine
.
An early version of this work also appeared in COLM 2024.

Paper link: arxiv.org/abs/2411.16035

Predicting Emergent Capabilities by Finetuning

A fundamental open challenge in modern LLM scaling is the lack of understanding around emergent capabilities. In particular, language model pretraining loss is known to be highly predictable as a func...

arxiv.org

November 26, 2024 at 10:37 PM

Charlie Snell

@seasnell.bsky.social

Finally, we present a case study of two real world uses for emergence prediction:

1) cheaply assessing pretraining data quality (left).

2) predicting more complex capabilities, closer to those of future frontier models, using the difficult APPS coding benchmark (right).

November 26, 2024 at 10:37 PM

Charlie Snell

@seasnell.bsky.social

We validate our emergence law using four standard NLP benchmarks where large-scale open-source LLMs already demonstrate emergence, so we can easily check our predictions.

We find that our emergence law can accurately predict the point of emergence up to 4x the FLOPs in advance.

November 26, 2024 at 10:37 PM

Charlie Snell

@seasnell.bsky.social

To operationalize this insight, we finetune LLMs on varying amounts of data and fit a parametric function (i.e., “emergence law”) which models how the point of emergence shifts with the amount of data. We can then extrapolate a prediction for emergence in the few-shot setting.

November 26, 2024 at 10:37 PM

Charlie Snell

@seasnell.bsky.social

We then discover a simple insight for this problem:

finetuning LLMs on a given task can shift the point in scaling at which emergence occurs towards less capable LLMs, and the magnitude of this shift is modulated by the amount of finetuning data.

November 26, 2024 at 10:37 PM

Charlie Snell

@seasnell.bsky.social

We first pose the task of emergence prediction:

given access to LLMs that have random few-shot accuracy on a task, can we predict the point in scaling (e.g., pretraining loss) at which performance will jump up beyond random-chance?

November 26, 2024 at 10:37 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news