Giorgos Tolias
gtolias.bsky.social
Giorgos Tolias
@gtolias.bsky.social
Associate Professor at CTU in Prague. Computer Vision Researcher at the Visual Recognition Group vrg.fel.cvut.cz. Made in Greece, exported to France and Czech Republic.
https://cmp.felk.cvut.cz/~toliageo
This is a paper that will be presented next month at #NeurIPS2025. The dataset and code are already publicly available.
November 6, 2025 at 2:12 PM
The studied setting allows to explore large image collections in flexible and creative ways: query with an image showing a particular object and add a text query to transform aspects like context, environment, lighting conditions, object state, and more.
November 6, 2025 at 2:12 PM
AnyUp is great. We are already using it flawlessly.
October 29, 2025 at 4:31 PM
This is the 7th edition of a workshop series that started from landmark recognition alone (CVPR18,CVPR19) and later broadened its scrope to instance-level recognition (ECCV20,ICCV21,ECCV22,ECCV24). This year we are expanding to include the so called personalized (instance-level) generation models.
October 16, 2025 at 6:53 AM
This is an event with physical attendance only.
October 6, 2025 at 3:13 PM
September 8, 2025 at 2:01 PM
This hints that a methodology like the one proposed in TULIP (by UC Berkeley) that optimize cross-modal and intra-model relationships during pre-training should make it into mainstream beast models.
arxiv.org/abs/2503.15485
TULIP: Towards Unified Language-Image Pretraining
Despite the recent success of image-text contrastive models like CLIP and SigLIP, these models often struggle with vision-centric tasks that demand high-fidelity image understanding, such as counting,...
arxiv.org
September 8, 2025 at 1:57 PM
Hypothesis: VLMs only optimize cross-modal relationships and not image-to-image relatioshions and, as a consequence, the visual representation space exhibits low local semantic consistency. Nevertheless, this appears to be easy to fix at a post-pre-training stage.
September 8, 2025 at 1:57 PM
After such a linear adaptation, perception encoder is the new SoA achieving 33.4% vs 28.3% achieved by DINOv3. Without the adaptation step the respective performance is 22.0% and 26.5%.
September 8, 2025 at 1:57 PM