Lightnews — Scholar-powered news

Phillip Isola

@phillipisola.bsky.social

5.5K followers 89 following 69 posts

Associate Professor in EECS at MIT. Neural nets, generative models, representation learning, computer vision, robotics, cog sci, AI.

https://web.mit.edu/phillipi/

Posts Replies Media Videos

Phillip Isola

@phillipisola.bsky.social

Suppose you have separate datasets X, Y, Z, without known correspondences.

We do the simplest thing: just train a model (e.g., a next-token predictor) on all elements of the concatenated dataset [X,Y,Z].

You end up with a better model of dataset X than if you had trained on X alone!

6/9

Architecture for Unpaired Multimodal Learner.

October 10, 2025 at 10:13 PM

Phillip Isola

@phillipisola.bsky.social

In “Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models,” we study a question I’ve wanted to make progress on for years: can you learn useful multimodal representations from *unpaired* data?

5/9

October 10, 2025 at 10:13 PM

Phillip Isola

@phillipisola.bsky.social

In “Words That Make Language Models Perceive,” we find if you ask an LLM to “imagine seeing,” then how it processes text becomes more like how a vision system would represent that same scene.

If you ask it to “imagine hearing,” its representation becomes more like that of an auditory model.

3/9

Diagram showing how prompts can steer an LLM toward kernel structure that better matches that of sensory encoders.

October 10, 2025 at 10:13 PM

Phillip Isola

@phillipisola.bsky.social

For context, this work stems from the idea that all data modalities (images, sounds, text, etc) are views of the same underlying world, and that treating them as such is useful.

We are interested in identifying commonalities between different models and modalities, and providing unifications.

2/9

October 10, 2025 at 10:13 PM

Phillip Isola

@phillipisola.bsky.social

Interesting reaction from ChatGPT to the HHS mRNA memo. It finds it so implausible that it thinks it's fake. From the perspective of a ~2024(?) trained model, 2025 policies are so absurd as to be unbelievable...

chatgpt.com/share/689364...

August 6, 2025 at 2:40 PM

Phillip Isola

@phillipisola.bsky.social

Last, a talk on "The Platonic Representation Hypothesis" (arxiv.org/abs/2405.07987) at UniReps.

This talk says: don't worry, we don't need to equip NNs with special distance measures, these structures emerge "for free" at scale!

I've given this one before, but there will be a bit that's new.

December 10, 2024 at 6:52 PM

Phillip Isola

@phillipisola.bsky.social

Next, we have "Scalable Optimization in the Modular Norm", aka "Modula" (arxiv.org/abs/2405.14813)

This one is about distance between settings of the *weights*.

Its answer: incorporate knowledge about the neural net *architecture* into how you measure distance.

December 10, 2024 at 6:52 PM

Phillip Isola

@phillipisola.bsky.social

First up, "When Does Perceptual Alignment Benefit Vision Representations?" (arxiv.org/abs/2410.10817)

This paper is about distance between embeddings.

It says: measure how humans perceive distance, then adjust a neural net to match.

This improves transfer to lots of tasks (but not all tasks).

December 10, 2024 at 6:52 PM

Phillip Isola

@phillipisola.bsky.social

At NeurIPS this year my lab is sharing a few papers and talks.

They are all about the following question: how to characterize the geometry of deep learning problems, and in particular how to measure *distance*?

Each paper/talk gives a rather different answer, detailed below:

December 10, 2024 at 6:52 PM

Phillip Isola

@phillipisola.bsky.social

Making a lecture on inference methods for deep nets. Here is my attempt at mapping out the interplay between training and inference.

A few items I wasn't sure where to put. You could break it down differently. What did I get wrong?

December 3, 2024 at 3:55 AM

Phillip Isola

@phillipisola.bsky.social

Sharing some new work!

A big dream in AI is to create world models of sufficient quality that you can train agents within them.

Classic simulators lack visual diversity and realism. GenAI lacks physical accuracy. But combining the two can work pretty well!

Paper: arxiv.org/abs/2411.00083

November 14, 2024 at 1:30 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news