Alice Bizeul's Avatar

Alice Bizeul

@alicebizeul

PhD student @ETH AI Center working on self-supervised representation learning | Previously @EPFL, @MIT, Research Intern @Amazon Personal website: https://alicebizeul.github.io

40
Followers
17
Following
11
Posts
16.11.2024
Joined
Posts Following

Latest posts by Alice Bizeul @alicebizeul

Preview
From Pixels to Components: Eigenvector Masking for Visual Representation Learning Predicting masked from visible parts of an image is a powerful self-supervised approach for visual representation learning. However, the common practice of masking random patches of pixels exhibits ce...

[10/🧡] This work is the result of an amazing team effort w/ Julius von Kügelgen, Alain Ryser, Thomas Sutter, Bernhard Schâlkopf, Julia Vogt

πŸ“œ arXiv: arxiv.org/abs/2502.06314
πŸ‘©β€πŸ’» Code: github.com/alicebizeul/...

19.03.2025 20:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

[9/🧡] As a result, PMAE’s masking ratio becomes a more interpretable and robust hyperparameter!

Unlike MAEsβ€”where the optimal ratio varies across datasetsβ€”we show that masking PCs that account for 20% of the data variance consistently yields near-optimal performance.

19.03.2025 20:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

[8/🧡] What about the masking ratio?

In MAEs, this ratio represents the proportion of masked-out pixels.

In PMAE, we make the masking ratio more data-driven by leveraging PCA. The masking ratio now reflects the proportion of data variance captured by the set of masked PCs.

19.03.2025 20:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

[7/🧡] We show that PMAE outperforms MAEs in downstream image classification on CIFAR10, TinyImageNet and MedMNIST datasets.

Using a ViT-Tiny, we observe an average 38% improvement in linear probing performance compared to MAEs with the standard 75% masking ratio.

19.03.2025 20:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

[6/🧡] However, instead of working with a subset of pixels, the ViT processes the original image with a subset of its principal components (PCs) masked out. The model is then trained to output images that, when projected onto the masked PCs, match the ground truth.

19.03.2025 20:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

[5/🧡] Our approach, Principal Masked Autoencoders (PMAE), closely follows the design of the Masked Autoencoder (MAE): a Vision Transformer (ViT) encoder-decoder is trained to reconstruct missing information from the visible parts.

19.03.2025 20:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

[4/🧡] We posit that this reduces the redundancy between visible and masked-out information and ensures the visible information is predictive of masked-out components.

19.03.2025 20:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

[3/🧡] Need a refresher on PCA?

For natural images, projecting data into its principal components partitions the information into a set of global features.

By masking principal components instead of raw pixels, we effectively mask more global rather than local features.

19.03.2025 20:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

[2/🧡] What if, instead of masking pixels, we mask information in a more meaningful space using off-the-shelf image transformations?

We keep it simple: we consider the space of principal components and reconstruct masked-out principal components instead of raw pixels.

19.03.2025 20:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

[1/🧡] Unlike text, images are not compact representations. Masking and reconstructing 75% of raw pixelsβ€”a common practice in MIMβ€”can thus lead to failure cases:
❌ Visible pixels may be redundant with the masked ones.
❌ Visible pixels may be unpredictive of the masked regions.

19.03.2025 20:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

✨New Preprint ✨ Ever thought that reconstructing masked pixels for image representation learning seems sub-optimal?

In our new preprint, we show how masking principal componentsβ€”rather than raw pixel patchesβ€” improves Masked Image Modelling (MIM).

Find out more below 🧡

19.03.2025 20:44 πŸ‘ 3 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0