PhD from University of Edinburgh.
ibalazevic.github.io
If you’re interested in learning why, I highly recommend giving Carl’s blog a read!
We take a step towards unravelling its mystery by explaining why the phenomenon of disentanglement arises in generative latent variable models.
Blog post: carl-allen.github.io/theory/2024/...
If you’re interested in learning why, I highly recommend giving Carl’s blog a read!
I will also be #NeurIPS2024 so come say hi! (Please email me to find time to chat)
I will also be #NeurIPS2024 so come say hi! (Please email me to find time to chat)
However, it's criminally undocumented. I tried using it outside Google to fine-tune PaliGemma and SigLIP on GPUs, and wrote a tutorial: lb.eyer.be/a/bv_tuto.html
However, it's criminally undocumented. I tried using it outside Google to fine-tune PaliGemma and SigLIP on GPUs, and wrote a tutorial: lb.eyer.be/a/bv_tuto.html
Context-Aware Multimodal Pretraining
Now on ArXiv
Can you turn vision-language models into strong any-shot models?
Go beyond zero-shot performance in SigLixP (x for context)
Read @confusezius.bsky.social thread below…
And follow Karsten … a rising star!
Turns out you can, and here is how: arxiv.org/abs/2411.15099
Really excited to this work on multimodal pretraining for my first bluesky entry!
🧵 A short and hopefully informative thread:
Context-Aware Multimodal Pretraining
Now on ArXiv
Can you turn vision-language models into strong any-shot models?
Go beyond zero-shot performance in SigLixP (x for context)
Read @confusezius.bsky.social thread below…
And follow Karsten … a rising star!
Fun project with @confusezius.bsky.social, @zeynepakata.bsky.social, @dimadamen.bsky.social and
@olivierhenaff.bsky.social.
Turns out you can, and here is how: arxiv.org/abs/2411.15099
Really excited to this work on multimodal pretraining for my first bluesky entry!
🧵 A short and hopefully informative thread:
Fun project with @confusezius.bsky.social, @zeynepakata.bsky.social, @dimadamen.bsky.social and
@olivierhenaff.bsky.social.