Alice Bizeul
banner
alicebizeul.bsky.social
Alice Bizeul
@alicebizeul.bsky.social
PhD student @ETH AI Center working on self-supervised representation learning | Previously @EPFL, @MIT, Research Intern @Amazon
Personal website: https://alicebizeul.github.io
[9/🧵] As a result, PMAE’s masking ratio becomes a more interpretable and robust hyperparameter!

Unlike MAEs—where the optimal ratio varies across datasets—we show that masking PCs that account for 20% of the data variance consistently yields near-optimal performance.
March 19, 2025 at 8:44 PM
[8/🧵] What about the masking ratio?

In MAEs, this ratio represents the proportion of masked-out pixels.

In PMAE, we make the masking ratio more data-driven by leveraging PCA. The masking ratio now reflects the proportion of data variance captured by the set of masked PCs.
March 19, 2025 at 8:44 PM
[7/🧵] We show that PMAE outperforms MAEs in downstream image classification on CIFAR10, TinyImageNet and MedMNIST datasets.

Using a ViT-Tiny, we observe an average 38% improvement in linear probing performance compared to MAEs with the standard 75% masking ratio.
March 19, 2025 at 8:44 PM
[6/🧵] However, instead of working with a subset of pixels, the ViT processes the original image with a subset of its principal components (PCs) masked out. The model is then trained to output images that, when projected onto the masked PCs, match the ground truth.
March 19, 2025 at 8:44 PM
[3/🧵] Need a refresher on PCA?

For natural images, projecting data into its principal components partitions the information into a set of global features.

By masking principal components instead of raw pixels, we effectively mask more global rather than local features.
March 19, 2025 at 8:44 PM
[2/🧵] What if, instead of masking pixels, we mask information in a more meaningful space using off-the-shelf image transformations?

We keep it simple: we consider the space of principal components and reconstruct masked-out principal components instead of raw pixels.
March 19, 2025 at 8:44 PM
[1/🧵] Unlike text, images are not compact representations. Masking and reconstructing 75% of raw pixels—a common practice in MIM—can thus lead to failure cases:
❌ Visible pixels may be redundant with the masked ones.
❌ Visible pixels may be unpredictive of the masked regions.
March 19, 2025 at 8:44 PM
✨New Preprint ✨ Ever thought that reconstructing masked pixels for image representation learning seems sub-optimal?

In our new preprint, we show how masking principal components—rather than raw pixel patches— improves Masked Image Modelling (MIM).

Find out more below 🧵
March 19, 2025 at 8:44 PM