Lightnews — Scholar-powered news

Vladan Stojnić

@stojnicv.xyz

October 21, 2025 at 6:36 PM

Vladan Stojnić

@stojnicv.xyz

Paper: arxiv.org/abs/2508.10637

Work with @ryan-ramos.bsky.social, @gkordo.bsky.social, Yuta Nakashima, @gtolias.bsky.social, and @noagarciad.bsky.social

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

Prior work has analyzed the robustness of visual encoders to image transformations and corruptions, particularly in cases where such alterations are not seen during training. When this occurs, they in...

arxiv.org

October 21, 2025 at 6:15 PM

Vladan Stojnić

@stojnicv.xyz

We show that representations from some foundation models, especially CVLs like CLIP, encode information about image metadata. More surprisingly we show that such metadata traces can even affect the performance on semantic downstream tasks.

October 21, 2025 at 6:15 PM

Vladan Stojnić

@stojnicv.xyz

Paper: arxiv.org/abs/2508.10637

Work with @ryan-ramos.bsky.social, @gkordo.bsky.social, Yuta Nakashima, @gtolias.bsky.social, and @noagarciad.bsky.social.

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

Prior work has analyzed the robustness of visual encoders to image transformations and corruptions, particularly in cases where such alterations are not seen during training. When this occurs, they in...

arxiv.org

October 21, 2025 at 6:11 PM

Vladan Stojnić

@stojnicv.xyz

We show that representations from some foundation models, especially CVLs like CLIP, encode information about image metadata. More surprisingly we show that such metadata traces can even affect the performance on semantic downstream tasks.

October 21, 2025 at 6:11 PM

Vladan Stojnić

@stojnicv.xyz

I second the Hyperion

October 20, 2025 at 8:46 AM

Vladan Stojnić

@stojnicv.xyz

When it comes to the CVL term we specifically went with it to discriminate CLIP-like VLMs from the VLMs that can generate text as the term VLM is overused and means many different things in different papers. It to an extent also follows the naming from arxiv.org/pdf/2405.17247

arxiv.org

August 18, 2025 at 3:05 PM

Vladan Stojnić

@stojnicv.xyz

I agree that the terminology is confusing. However, I wouldn't agree that CLIP is an SSL method. It uses a contrastive loss, but not with self-supervised labels. DINOv2 and v3 classify it as weakly-supervised as it uses labels coming from the text.

August 18, 2025 at 3:05 PM

Vladan Stojnić

@stojnicv.xyz

Many thanks to the amazing collaborators: @ryan-ramos.bsky.social , @gkordo.bsky.social , Yuta Nakashima, @gtolias.bsky.social , @noagarciad.bsky.social

August 18, 2025 at 10:48 AM

Vladan Stojnić

@stojnicv.xyz

If this caught your attention, check out our new paper.

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

arxiv.org/abs/2508.10637

To be presented at #ICCV2025 (highlight). @iccv.bsky.social

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

Prior work has analyzed the robustness of visual encoders to image transformations and corruptions, particularly in cases where such alterations are not seen during training. When this occurs, they in...

arxiv.org

August 18, 2025 at 10:48 AM

Vladan Stojnić

@stojnicv.xyz

The same pattern can be observed for the acquisition parameters in the task of near-duplicate retrieval. If the negatives are captured using the same camera as the query, the task becomes harder for some models compared to the case when they are captured by a different camera.

August 18, 2025 at 10:48 AM

Vladan Stojnić

@stojnicv.xyz

Impact on the semantic performance is again the most pronounced for contrastive VLMs, and the least for SSL models.

Here, we show kNN classification in a few cases, depending on whether the semantic positives and negatives share the same processing parameters as the test image.

August 18, 2025 at 10:48 AM

Vladan Stojnić

@stojnicv.xyz

This impact is especially pronounced when there is a strong correlation/anticorrelation between the semantic and metadata labels. E.g., when semantic positives/negatives have the same/different processing parameters as a query image.

August 18, 2025 at 10:48 AM

Vladan Stojnić

@stojnicv.xyz

More strikingly, we show that traces of these metadata labels (processing and acquisition parameters) can significantly impact the semantic recognition abilities.

August 18, 2025 at 10:48 AM

Vladan Stojnić

@stojnicv.xyz

A similar pattern is observed for the acquisition parameters, although generally, all models have a harder time predicting these parameters than the processing ones.

August 18, 2025 at 10:48 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news