Alaa El-Nouby
alaaelnouby.bsky.social
Alaa El-Nouby
@alaaelnouby.bsky.social
Research Scientist at @Apple. Previous: @Meta (FAIR), @Inria, @MSFTResearch, @VectorInst and @UofG . Egyptian 🇪🇬
Could you clarify for what task did you test the checkpoints and which checkpoint in particular did you use? Thanks!
November 22, 2024 at 4:41 PM
Hey Johan, For AIMv2 please use the last layer features, typically after the post trunk layer normalization.
November 22, 2024 at 4:40 PM
It has been an absolute pleasure working with Enrico, Mustafa and the whole AIMv2 team the past few months. We are looking forward to seeing our models being useful to the community.

For many more results, insights and analysis please check our preprint. arxiv.org/abs/2411.14402
Multimodal Autoregressive Pre-training of Large Vision Encoders
We introduce a novel method for pre-training of large-scale vision encoders. Building on recent advancements in autoregressive pre-training of vision models, we extend this framework to a multimodal s...
arxiv.org
November 22, 2024 at 8:32 AM
The open-sourced AIMv2 checkpoints support a number of fixed resolutions (224px, 336px, and 448px) in addition to a Native resolution checkpoint that accepts images of variable resolutions and aspect ratios
November 22, 2024 at 8:32 AM
AIMv2 provides a strong off-the-shelf recognition performance, with AIMv2-3B achieving 89.5% on ImageNet with a frozen-trunk. We also observe consistent improvement in performance with scaling the parameters for AIMv2 (check Section.3 in the preprint)
November 22, 2024 at 8:32 AM
AIMv2 is pre-trained in a manner similar to modern VLMs; therefore, it can be integrated seamlessly with our smallest backbone (i.e., AIMv2-L), outperforming popular backbones such as OpenAI CLIP and SigLIP on multimodal understanding benchmarks
November 22, 2024 at 8:32 AM
AIMv2 is pre-trained to autoregressively generate image patches and text tokens. It is easy to implement and train and it can be trivially scaled to billions of parameters. We are sharing checkpoints ranging between 300M and 3B params, available in Pytorch, JAX, and MLX on🤗
November 22, 2024 at 8:32 AM