Alexander Kolesnikov
kolesnikov.ch
Alexander Kolesnikov
@kolesnikov.ch
Reposted by Alexander Kolesnikov
Looking for a small or medium sized VLM? PaliGemma 2 spans more than 150x of compute!

Not sure yet if you want to invest the time 🪄finetuning🪄 on your data? Give it a try with our ready-to-use "mix" checkpoints:

🤗 huggingface.co/blog/paligem...
🎤 developers.googleblog.com/en/introduci...
February 19, 2025 at 5:47 PM
With some delay, JetFormer's *prequel* paper is finally out on arXiv: a radically simple ViT-based normalizing flow (NF) model that achieves SOTA results in its class.

Jet is one of the key components of JetFormer, deserving a standalone report. Let's unpack: 🧵⬇️
December 20, 2024 at 2:39 PM
Paligemma2 is out! Bigger models, better results. For the best experience, do not forget to finetune.

Congrats Paligemma2 team!
🚀🚀PaliGemma 2 is our updated and improved PaliGemma release using the Gemma 2 models and providing new pre-trained checkpoints for the full cross product of {224px,448px,896px} resolutions and {3B,10B,28B} model sizes.

1/7
December 5, 2024 at 6:28 PM
Ok, it is yesterdays news already, but good night sleep is important.

After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.
December 4, 2024 at 9:14 AM
Reposted by Alexander Kolesnikov
In arxiv.org/abs/2303.00848, @dpkingma.bsky.social and @ruiqigao.bsky.social had suggested that noise augmentation could be used to make other likelihood-based models optimise perceptually weighted losses, like diffusion models do. So cool to see this working well in practice!
December 2, 2024 at 6:36 PM
The answer has just dropped: bsky.app/profile/kole...
2021: Replace every CNN with a Transformer

2022: Replace every GAN with diffusion models

2023: Replace every NeRF with 3DGS

2024: Replace every diffusion model with Flow Matching

2025: ???
December 2, 2024 at 7:00 PM
I always dreamed of a model that simultaneously

1. optimizes NLL of raw pixel data,
2. generates competitive high-res. natural images,
3. is practical.

But it seemed too good to be true. Until today!

Our new JetFormer model (arxiv.org/abs/2411.19722) ticks on all of these.

🧵
December 2, 2024 at 5:19 PM