Phillip Isola
phillipisola.bsky.social
Phillip Isola
@phillipisola.bsky.social
Associate Professor in EECS at MIT. Neural nets, generative models, representation learning, computer vision, robotics, cog sci, AI.

https://web.mit.edu/phillipi/
Suppose you have separate datasets X, Y, Z, without known correspondences.

We do the simplest thing: just train a model (e.g., a next-token predictor) on all elements of the concatenated dataset [X,Y,Z].

You end up with a better model of dataset X than if you had trained on X alone!

6/9
October 10, 2025 at 10:13 PM
In “Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models,” we study a question I’ve wanted to make progress on for years: can you learn useful multimodal representations from *unpaired* data?

5/9
October 10, 2025 at 10:13 PM
In “Words That Make Language Models Perceive,” we find if you ask an LLM to “imagine seeing,” then how it processes text becomes more like how a vision system would represent that same scene.

If you ask it to “imagine hearing,” its representation becomes more like that of an auditory model.

3/9
October 10, 2025 at 10:13 PM
For context, this work stems from the idea that all data modalities (images, sounds, text, etc) are views of the same underlying world, and that treating them as such is useful.

We are interested in identifying commonalities between different models and modalities, and providing unifications.

2/9
October 10, 2025 at 10:13 PM
Interesting reaction from ChatGPT to the HHS mRNA memo. It finds it so implausible that it thinks it's fake. From the perspective of a ~2024(?) trained model, 2025 policies are so absurd as to be unbelievable...

chatgpt.com/share/689364...
August 6, 2025 at 2:40 PM
Last, a talk on "The Platonic Representation Hypothesis" (arxiv.org/abs/2405.07987) at UniReps.

This talk says: don't worry, we don't need to equip NNs with special distance measures, these structures emerge "for free" at scale!

I've given this one before, but there will be a bit that's new.
December 10, 2024 at 6:52 PM
Next, we have "Scalable Optimization in the Modular Norm", aka "Modula" (arxiv.org/abs/2405.14813)

This one is about distance between settings of the *weights*.

Its answer: incorporate knowledge about the neural net *architecture* into how you measure distance.
December 10, 2024 at 6:52 PM
First up, "When Does Perceptual Alignment Benefit Vision Representations?" (arxiv.org/abs/2410.10817)

This paper is about distance between embeddings.

It says: measure how humans perceive distance, then adjust a neural net to match.

This improves transfer to lots of tasks (but not all tasks).
December 10, 2024 at 6:52 PM
At NeurIPS this year my lab is sharing a few papers and talks.

They are all about the following question: how to characterize the geometry of deep learning problems, and in particular how to measure *distance*?

Each paper/talk gives a rather different answer, detailed below:
December 10, 2024 at 6:52 PM
Making a lecture on inference methods for deep nets. Here is my attempt at mapping out the interplay between training and inference.

A few items I wasn't sure where to put. You could break it down differently. What did I get wrong?
December 3, 2024 at 3:55 AM
Sharing some new work!

A big dream in AI is to create world models of sufficient quality that you can train agents within them.

Classic simulators lack visual diversity and realism. GenAI lacks physical accuracy. But combining the two can work pretty well!

Paper: arxiv.org/abs/2411.00083
November 14, 2024 at 1:30 AM