Lightnews — Scholar-powered news

Keyon Vafa

@keyonv.bsky.social

170 followers 100 following 28 posts

Postdoctoral fellow at Harvard Data Science Initiative | Former computer science PhD at Columbia University | ML + NLP + social sciences
https://keyonvafa.com

Posts Replies Media Videos

Keyon Vafa

@keyonv.bsky.social

Inductive bias probes can test this hypothesis more generally.

Models are much likelier to conflate two separate states when they share the same legal next-tokens.

July 14, 2025 at 1:50 PM

Keyon Vafa

@keyonv.bsky.social

We fine-tune an Othello next-token prediction model to reconstruct boards.

Even when the model reconstructs boards incorrectly, the reconstructed boards often get the legal next moves right.

Models seem to construct "enough of" the board to calculate single next moves.

July 14, 2025 at 1:50 PM

Keyon Vafa

@keyonv.bsky.social

We also apply these probes to lattice problems (think gridworld).

Inductive biases are great when the number of states is small. But they deteriorate quickly.

Recurrent and state-space models like Mamba consistently have better inductive biases than transformers.

July 14, 2025 at 1:50 PM

Keyon Vafa

@keyonv.bsky.social

Would more general models like LLMs do better?

We tried providing o3, Claude Sonnet 4, and Gemini 2.5 Pro with a small number of force magnitudes in-context w/o saying what they are.

These LLMs are explicitly trained on Newton's laws. But they can't get the rest of the forces.

July 14, 2025 at 1:50 PM

Keyon Vafa

@keyonv.bsky.social

We then fine-tuned the model on a larger scale, to predict forces across 10K solar systems.

We used a symbolic regression to compare the recovered force law to Newton's law.

It not only recovered a nonsensical law—it recovered different laws for different galaxies.

July 14, 2025 at 1:50 PM

Keyon Vafa

@keyonv.bsky.social

To demonstrate, we fine-tuned the model to predict force vectors on a small dataset of planets in our solar system.

A model that understands Newtonian mechanics should get these. But the transformer struggles.

July 14, 2025 at 1:50 PM

Keyon Vafa

@keyonv.bsky.social

But has the model discovered Newton's laws?

When we fine-tune it to new tasks, its inductive bias isn't toward Newtonian states.

When it extrapolates, it makes similar predictions for orbits with very different states, and different predictions for orbits with similar states.

July 14, 2025 at 1:50 PM

Keyon Vafa

@keyonv.bsky.social

We apply these probes to orbital, lattice, and Othello problems.

Starting with orbits: we encode solar systems as sequences and train a transformer on 10M solar systems (20B tokens)

The model makes accurate predictions many timesteps ahead. Predictions for our solar system:

July 14, 2025 at 1:50 PM

Keyon Vafa

@keyonv.bsky.social

We propose a method to measure these inductive biases. We call it an inductive bias probe.

Two steps:
1. Fit a foundation model to many new, very small synthetic datasets
2. Analyze patterns in the functions it learns to find the model's inductive bias

July 14, 2025 at 1:50 PM

Keyon Vafa

@keyonv.bsky.social

Perhaps the most influential world model had its start as a predictive model.

Before we had Newton's laws of gravity, we had Kepler's predictions of planetary orbits.

Kepler's predictions led to Newton's laws. So what did Newton add?

July 14, 2025 at 1:50 PM

Keyon Vafa

@keyonv.bsky.social

Our paper aims to answer two questions:

1. What's the difference between prediction and world models?
2. Are there straightforward metrics that can test this distinction?

Our paper is about AI. But it's helpful to go back 400 years to answer these questions.

July 14, 2025 at 1:50 PM

Keyon Vafa

@keyonv.bsky.social

Can an AI model predict perfectly and still have a terrible world model?

What would that even mean?

Our new ICML paper (poster tomorrow!) formalizes these questions.

One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵

July 14, 2025 at 1:50 PM

Keyon Vafa

@keyonv.bsky.social

Our paper proposes new metrics for world model recovery based on the Myhill-Nerode theorem from language theory:

Co-authors: Justin Chen, Ashesh Rambachan, Jon Kleinberg, Sendhil Mullainathan (@sendhil.bsky.social)

December 12, 2024 at 6:59 PM

Keyon Vafa

@keyonv.bsky.social

Our paper asks: how can we tell if a transformer has the right world model?

We trained a transformer to predict directions for NYC taxi rides. The model was good. It could find shortest paths between new points

But had it built a map of NYC? We reconstructed its map and found incoherence:

December 12, 2024 at 6:59 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news