I would love to get any feedback, so please feel free to reach out!
🧵9/9
I would love to get any feedback, so please feel free to reach out!
🧵9/9
Thus, increasing the token's norm or having non-sparse representations become more costly for the ViT, promoting less repurposing, better representations and, as a result, better quantitative performance.
🧵8/9
Thus, increasing the token's norm or having non-sparse representations become more costly for the ViT, promoting less repurposing, better representations and, as a result, better quantitative performance.
🧵8/9
🧵7/9
🧵7/9
🧵6/9
🧵6/9
Choosing appropriate amplitudes, we can preserve the topology of the latent space. The smaller the amplitude, the less modifications are expected in the topology.
🧵5/9
Choosing appropriate amplitudes, we can preserve the topology of the latent space. The smaller the amplitude, the less modifications are expected in the topology.
🧵5/9
We proposed replacing the learnable parameters from the MLPs by random variables turning them into random embeddings, creating the ✨Randomized-MLP (RMLP)✨. This architecture has a hyperparameter, amplitude, to control the standard deviation of the variables.
🧵4/9
We proposed replacing the learnable parameters from the MLPs by random variables turning them into random embeddings, creating the ✨Randomized-MLP (RMLP)✨. This architecture has a hyperparameter, amplitude, to control the standard deviation of the variables.
🧵4/9
Taking DINO and iBOT losses, let's consider the teacher providing two stable classes. The student and its MLP then need to learn how to match that classification, where part of the learning might go to the MLP.
🧵3/9
Taking DINO and iBOT losses, let's consider the teacher providing two stable classes. The student and its MLP then need to learn how to match that classification, where part of the learning might go to the MLP.
🧵3/9
Token's norm has been used to spot ViTs repurposing patch tokens to encode general information on void regions on natural images and regularisation techniques have been developed to avoid this. We saw this behaviour on regularised models when applied to medical images.
🧵2/9
Token's norm has been used to spot ViTs repurposing patch tokens to encode general information on void regions on natural images and regularisation techniques have been developed to avoid this. We saw this behaviour on regularised models when applied to medical images.
🧵2/9