Personal website: https://alicebizeul.github.io
📜 arXiv: arxiv.org/abs/2502.06314
👩💻 Code: github.com/alicebizeul/...
📜 arXiv: arxiv.org/abs/2502.06314
👩💻 Code: github.com/alicebizeul/...
Unlike MAEs—where the optimal ratio varies across datasets—we show that masking PCs that account for 20% of the data variance consistently yields near-optimal performance.
Unlike MAEs—where the optimal ratio varies across datasets—we show that masking PCs that account for 20% of the data variance consistently yields near-optimal performance.
In MAEs, this ratio represents the proportion of masked-out pixels.
In PMAE, we make the masking ratio more data-driven by leveraging PCA. The masking ratio now reflects the proportion of data variance captured by the set of masked PCs.
In MAEs, this ratio represents the proportion of masked-out pixels.
In PMAE, we make the masking ratio more data-driven by leveraging PCA. The masking ratio now reflects the proportion of data variance captured by the set of masked PCs.
Using a ViT-Tiny, we observe an average 38% improvement in linear probing performance compared to MAEs with the standard 75% masking ratio.
Using a ViT-Tiny, we observe an average 38% improvement in linear probing performance compared to MAEs with the standard 75% masking ratio.
For natural images, projecting data into its principal components partitions the information into a set of global features.
By masking principal components instead of raw pixels, we effectively mask more global rather than local features.
For natural images, projecting data into its principal components partitions the information into a set of global features.
By masking principal components instead of raw pixels, we effectively mask more global rather than local features.
We keep it simple: we consider the space of principal components and reconstruct masked-out principal components instead of raw pixels.
We keep it simple: we consider the space of principal components and reconstruct masked-out principal components instead of raw pixels.
❌ Visible pixels may be redundant with the masked ones.
❌ Visible pixels may be unpredictive of the masked regions.
❌ Visible pixels may be redundant with the masked ones.
❌ Visible pixels may be unpredictive of the masked regions.