We find evidence for pretty advanced structures in latent space, such as the tendency to use orbitals (see picture) to compute arithmetic tasks and reasoning about sentence structure
So, this model really is rotating shapes in a high-dimensional space?
We find evidence for pretty advanced structures in latent space, such as the tendency to use orbitals (see picture) to compute arithmetic tasks and reasoning about sentence structure
So, this model really is rotating shapes in a high-dimensional space?
In this figure the model takes more time to think about the key parts of the text:
In this figure the model takes more time to think about the key parts of the text:
On reasoning tasks like GSM8k, the model is pretty competitive, even compared to other pretrained open-source models, even though we have done no post/mid-training...
On reasoning tasks like GSM8k, the model is pretty competitive, even compared to other pretrained open-source models, even though we have done no post/mid-training...
This is pretty exciting, for our first large-scale run
This is pretty exciting, for our first large-scale run
We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale.
The model has an internal latent space in which it can adaptively spend more compute to think longer.
I think the tech report ...🐦⬛
We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale.
The model has an internal latent space in which it can adaptively spend more compute to think longer.
I think the tech report ...🐦⬛