Lightnews — Scholar-powered news

Thibaut Loiseau

@thibautloiseau.bsky.social

I will be at #CVPR2025 to present this work (RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges) at 4pm, poster #88.
Come if you want to discuss!

June 15, 2025 at 8:28 PM

Reposted by Thibaut Loiseau

Vincent Lepetit

@vincentlepetit.bsky.social

I am heartbroken that I am not at the conference, but seeing what the government is doing to its people and the world, I simply couldn't go there.

June 14, 2025 at 9:51 AM

Thibaut Loiseau

@thibautloiseau.bsky.social

In the end, we might not care about explicit correspondences during pre-training, as it might already happen implicitly as was seen in CroCo. Also, the checks are done in 3D in the pipeline, and it is difficult to get pixel level correspondences with the current approach.

March 11, 2025 at 1:34 PM

Thibaut Loiseau

@thibautloiseau.bsky.social

Hi Johan, thanks for your question :) For now, each pixel only has its associated class, but we might be able to add explicit correspondences between pixels in the pipeline.

March 11, 2025 at 1:34 PM

Thibaut Loiseau

@thibautloiseau.bsky.social

13/13 For more details, check out our paper: arxiv.org/abs/2503.07561 and feel free to reach out with questions!

Alligat0R: Pre-Training Through Co-Visibility Segmentation for Relative Camera Pose Regression

Pre-training techniques have greatly advanced computer vision, with CroCo's cross-view completion approach yielding impressive results in tasks like 3D reconstruction and pose regression. However, thi...

arxiv.org

March 11, 2025 at 10:52 AM

Thibaut Loiseau

@thibautloiseau.bsky.social

12/13 Code and the Cub3 dataset will be released soon. Stay tuned!

March 11, 2025 at 10:52 AM

Thibaut Loiseau

@thibautloiseau.bsky.social

11/13 The implications are exciting: Alligat0R enables more robust visual perception systems that can handle and benefit from challenging real-world scenarios with varying degrees of overlap between views.

March 11, 2025 at 10:52 AM

Thibaut Loiseau

@thibautloiseau.bsky.social

10/13 Our approach not only improves performance but also provides interpretable visualizations of the model's geometric understanding through its segmentation outputs.

March 11, 2025 at 10:52 AM

Thibaut Loiseau

@thibautloiseau.bsky.social

9/13 Alligat0R works particularly well on difficult pairs, maintaining strong performance even as overlap decreases, while CroCo's accuracy drops dramatically below 40% overlap.

March 11, 2025 at 10:52 AM

Thibaut Loiseau

@thibautloiseau.bsky.social

8/13 On the RUBIK benchmark, our method achieves 60.3% accuracy (at 5°/2m threshold) compared to just 19.1% for the best CroCo model!

March 11, 2025 at 10:52 AM

Thibaut Loiseau

@thibautloiseau.bsky.social

7/13 In experiments, Alligat0R significantly outperforms CroCo for relative pose regression, with the same architecture, especially in challenging scenarios with limited overlap between views.

March 11, 2025 at 10:52 AM

Thibaut Loiseau

@thibautloiseau.bsky.social

6/13 To enable this approach, we created Cub3, a large-scale dataset with 2.5M image pairs and dense co-visibility annotations derived from nuScenes, featuring pairs with varying degrees of overlap, scale ratio and viewpoint angle.

March 11, 2025 at 10:52 AM

Thibaut Loiseau

@thibautloiseau.bsky.social

5/13 This formulation offers major advantages:
- Can use image pairs with ANY degree of overlap
- Provides interpretable outputs showing the model's 3D understanding
- Better aligns with downstream binocular vision tasks

March 11, 2025 at 10:52 AM

Thibaut Loiseau

@thibautloiseau.bsky.social

4/13 The key insight: For each pixel in one image, we explicitly predict whether it is:
- Co-visible in the second image
- Occluded in the second image
- Outside the field of view in the second image

March 11, 2025 at 10:52 AM

Thibaut Loiseau

@thibautloiseau.bsky.social

3/13 We introduce Alligat0R, which reformulates this problem as a co-visibility segmentation task instead of trying to reconstruct masked regions of images.

March 11, 2025 at 10:52 AM

Thibaut Loiseau

@thibautloiseau.bsky.social

2/13 Current methods for training vision models to understand 3D relationships between images (like CroCo) require substantial overlap (>50%) between training image pairs. This limits their effectiveness in many real-world scenarios.

March 11, 2025 at 10:52 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news