Lightnews — Scholar-powered news

Giorgos Tolias

@gtolias.bsky.social

1.2K followers 450 following 130 posts

Associate Professor at CTU in Prague. Computer Vision Researcher at the Visual Recognition Group vrg.fel.cvut.cz. Made in Greece, exported to France and Czech Republic.
https://cmp.felk.cvut.cz/~toliageo

Posts Replies Media Videos

Giorgos Tolias

@gtolias.bsky.social

This is a paper that will be presented next month at #NeurIPS2025. The dataset and code are already publicly available.

November 6, 2025 at 2:12 PM

Giorgos Tolias

@gtolias.bsky.social

The studied setting allows to explore large image collections in flexible and creative ways: query with an image showing a particular object and add a text query to transform aspects like context, environment, lighting conditions, object state, and more.

November 6, 2025 at 2:12 PM

Giorgos Tolias

@gtolias.bsky.social

AnyUp is great. We are already using it flawlessly.

October 29, 2025 at 4:31 PM

Giorgos Tolias

@gtolias.bsky.social

This is the 7th edition of a workshop series that started from landmark recognition alone (CVPR18,CVPR19) and later broadened its scrope to instance-level recognition (ECCV20,ICCV21,ECCV22,ECCV24). This year we are expanding to include the so called personalized (instance-level) generation models.

October 16, 2025 at 6:53 AM

Giorgos Tolias

@gtolias.bsky.social

This is an event with physical attendance only.

October 6, 2025 at 3:13 PM

Giorgos Tolias

@gtolias.bsky.social

@sattlertorsten.bsky.social @pesarlin.bsky.social @vickykalogeiton.bsky.social @spyrosgidaris.bsky.social @annakukleva.bsky.social @lukasneumann.bsky.social

October 6, 2025 at 3:13 PM

Giorgos Tolias

@gtolias.bsky.social

#skyvision

September 8, 2025 at 2:01 PM

Giorgos Tolias

@gtolias.bsky.social

This hints that a methodology like the one proposed in TULIP (by UC Berkeley) that optimize cross-modal and intra-model relationships during pre-training should make it into mainstream beast models.
arxiv.org/abs/2503.15485

TULIP: Towards Unified Language-Image Pretraining

Despite the recent success of image-text contrastive models like CLIP and SigLIP, these models often struggle with vision-centric tasks that demand high-fidelity image understanding, such as counting,...

arxiv.org

September 8, 2025 at 1:57 PM

Giorgos Tolias

@gtolias.bsky.social

Hypothesis: VLMs only optimize cross-modal relationships and not image-to-image relatioshions and, as a consequence, the visual representation space exhibits low local semantic consistency. Nevertheless, this appears to be easy to fix at a post-pre-training stage.

September 8, 2025 at 1:57 PM

Giorgos Tolias

@gtolias.bsky.social

After such a linear adaptation, perception encoder is the new SoA achieving 33.4% vs 28.3% achieved by DINOv3. Without the adaptation step the respective performance is 22.0% and 26.5%.

September 8, 2025 at 1:57 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news