Karsten Roth
banner
confusezius.bsky.social
Karsten Roth
@confusezius.bsky.social
Large Models, Multimodality, Continual Learning | ELLIS ML PhD with Oriol Vinyals & Zeynep Akata | Previously Google DeepMind, Meta AI, AWS, Vector, MILA

🔗 karroth.com
Also very thankful for the research environment provided by @ellis.eu and @mpi-is.bsky.social, which made this PhD such an inter-european experience!
August 4, 2025 at 2:59 PM
Huge thanks also to my thesis committee Peter Gehler, Matthias Bethge, @wielandbrendel.bsky.social and @phillipisola.bsky.social, and of course all the wonderful people and collaborators I had the pleasure of spending time and working with these past years!
August 4, 2025 at 2:59 PM
Reposted by Karsten Roth
📄 Disentangled Representation Learning with the Gromov-Monge Gap

with Théo Uscidda, Luca Eyring, @confusezius.bsky.social, Fabian J Theis, Marco Cuturi

📄 Decoupling Angles and Strength in Low-rank Adaptation

with Massimo Bini, Leander Girrbach
January 24, 2025 at 8:02 PM
December 10, 2024 at 4:42 PM
We will present on Wednesday - East Exhibit Hall A-C #3703 ☺️. We've also released the entire codebase with all the methods and 60+ dataloaders that can be mixed and matched in any fashion to study continual pretraining!
December 10, 2024 at 4:42 PM
Oh that's a really cool paper! Thanks for the pointer!
November 29, 2024 at 7:22 AM
Oh neat, do you have a link? 😁
November 28, 2024 at 4:32 PM
LIxP was carefully designed and tested for scalability!

LIxP also maintains the strong zero-shot transfer of CLIP and SigLIP backbones across model sizes (S to L) and data (up to 15B), and allows up to 4x sample efficiency at test time, and up to +16% performance gains!
November 28, 2024 at 2:33 PM
In LIxP, we utilize a learnable temperature separation and a simple cross-attention-based formalism to augment existing contrastive vision-language training.

We teach models what to expect at test-time in few-shot scenarios.
November 28, 2024 at 2:33 PM
They can struggle with applications that require operating on new context, e.g. few-shot adaptation.

Why? They do not explicitly train for that!

We find a surrogate objective to optimize for -- context-aware language-image pretraining (LIxP)
November 28, 2024 at 2:33 PM
This was an insightful project I worked on at Google DeepMind alongside the amazing @zeynepakata.bsky.social , @dimadamen.bsky.social , @ibalazevic.bsky.social and @olivierhenaff.bsky.social:

👉Language-image pretraining with CLIP or SigLIP is widely used due to strong zero-shot transfer, but ....
November 28, 2024 at 2:33 PM