Lightnews — Scholar-powered news

Jonas

@jonasgeiping.bsky.social

280 followers 150 following 11 posts

ML research, safety & efficiency

Posts Replies Media Videos

Jonas

@jonasgeiping.bsky.social

What is it doing when it thinks longer?

We find evidence for pretty advanced structures in latent space, such as the tendency to use orbitals (see picture) to compute arithmetic tasks and reasoning about sentence structure

So, this model really is rotating shapes in a high-dimensional space?

February 10, 2025 at 4:48 PM

Jonas

@jonasgeiping.bsky.social

What is pretty exciting is that simply by training with our arch and objective, a separation emerges from scale - the model's latents converge quicker for some tokens in a sentence than others,

In this figure the model takes more time to think about the key parts of the text:

February 10, 2025 at 4:48 PM

Jonas

@jonasgeiping.bsky.social

We had enough compute for only a single shot to train at scale (and that is the model we've published).

On reasoning tasks like GSM8k, the model is pretty competitive, even compared to other pretrained open-source models, even though we have done no post/mid-training...

February 10, 2025 at 4:48 PM

Jonas

@jonasgeiping.bsky.social

First, the model (with 3.5B params), even though trained semi-optimally, and for 800B tokens, is competive with 7B open-source models trained for 2-3T tokens (OLMo-v1) - but we can't beat the new OLMo data recipe (yet)

This is pretty exciting, for our first large-scale run

February 10, 2025 at 4:48 PM

Jonas

@jonasgeiping.bsky.social

Ok, so I can finally talk about this!

We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale.

The model has an internal latent space in which it can adaptively spend more compute to think longer.

I think the tech report ...🐦‍⬛

February 10, 2025 at 4:48 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news