1) cheaply assessing pretraining data quality (left).
2) predicting more complex capabilities, closer to those of future frontier models, using the difficult APPS coding benchmark (right).
1) cheaply assessing pretraining data quality (left).
2) predicting more complex capabilities, closer to those of future frontier models, using the difficult APPS coding benchmark (right).
We find that our emergence law can accurately predict the point of emergence up to 4x the FLOPs in advance.
We find that our emergence law can accurately predict the point of emergence up to 4x the FLOPs in advance.
finetuning LLMs on a given task can shift the point in scaling at which emergence occurs towards less capable LLMs, and the magnitude of this shift is modulated by the amount of finetuning data.
finetuning LLMs on a given task can shift the point in scaling at which emergence occurs towards less capable LLMs, and the magnitude of this shift is modulated by the amount of finetuning data.
We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵
We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵