This really was a long project for us, with initial starts in Summer '23!
This really was a long project for us, with initial starts in Summer '23!
The code here: github.com/seal-rg/recu...
and the tech report here: www.arxiv.org/abs/2502.05171
The code here: github.com/seal-rg/recu...
and the tech report here: www.arxiv.org/abs/2502.05171
We find evidence for pretty advanced structures in latent space, such as the tendency to use orbitals (see picture) to compute arithmetic tasks and reasoning about sentence structure
So, this model really is rotating shapes in a high-dimensional space?
We find evidence for pretty advanced structures in latent space, such as the tendency to use orbitals (see picture) to compute arithmetic tasks and reasoning about sentence structure
So, this model really is rotating shapes in a high-dimensional space?
In this figure the model takes more time to think about the key parts of the text:
In this figure the model takes more time to think about the key parts of the text:
On reasoning tasks like GSM8k, the model is pretty competitive, even compared to other pretrained open-source models, even though we have done no post/mid-training...
On reasoning tasks like GSM8k, the model is pretty competitive, even compared to other pretrained open-source models, even though we have done no post/mid-training...
This is pretty exciting, for our first large-scale run
This is pretty exciting, for our first large-scale run
Here are a few of my highlights:
Here are a few of my highlights: