Thibaut Loiseau
thibautloiseau.bsky.social
Thibaut Loiseau
@thibautloiseau.bsky.social
PhD Student at IMAGINE (ENPC)

Working on camera pose estimation

thibautloiseau.github.io
I will be at #CVPR2025 to present this work (RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges) at 4pm, poster #88.
Come if you want to discuss!
June 15, 2025 at 8:28 PM
Reposted by Thibaut Loiseau
I am heartbroken that I am not at the conference, but seeing what the government is doing to its people and the world, I simply couldn't go there.
June 14, 2025 at 9:51 AM
In the end, we might not care about explicit correspondences during pre-training, as it might already happen implicitly as was seen in CroCo. Also, the checks are done in 3D in the pipeline, and it is difficult to get pixel level correspondences with the current approach.
March 11, 2025 at 1:34 PM
Hi Johan, thanks for your question :) For now, each pixel only has its associated class, but we might be able to add explicit correspondences between pixels in the pipeline.
March 11, 2025 at 1:34 PM
13/13 For more details, check out our paper: arxiv.org/abs/2503.07561 and feel free to reach out with questions!
Alligat0R: Pre-Training Through Co-Visibility Segmentation for Relative Camera Pose Regression
Pre-training techniques have greatly advanced computer vision, with CroCo's cross-view completion approach yielding impressive results in tasks like 3D reconstruction and pose regression. However, thi...
arxiv.org
March 11, 2025 at 10:52 AM
12/13 Code and the Cub3 dataset will be released soon. Stay tuned!
March 11, 2025 at 10:52 AM
11/13 The implications are exciting: Alligat0R enables more robust visual perception systems that can handle and benefit from challenging real-world scenarios with varying degrees of overlap between views.
March 11, 2025 at 10:52 AM
10/13 Our approach not only improves performance but also provides interpretable visualizations of the model's geometric understanding through its segmentation outputs.
March 11, 2025 at 10:52 AM
9/13 Alligat0R works particularly well on difficult pairs, maintaining strong performance even as overlap decreases, while CroCo's accuracy drops dramatically below 40% overlap.
March 11, 2025 at 10:52 AM
8/13 On the RUBIK benchmark, our method achieves 60.3% accuracy (at 5°/2m threshold) compared to just 19.1% for the best CroCo model!
March 11, 2025 at 10:52 AM
7/13 In experiments, Alligat0R significantly outperforms CroCo for relative pose regression, with the same architecture, especially in challenging scenarios with limited overlap between views.
March 11, 2025 at 10:52 AM
6/13 To enable this approach, we created Cub3, a large-scale dataset with 2.5M image pairs and dense co-visibility annotations derived from nuScenes, featuring pairs with varying degrees of overlap, scale ratio and viewpoint angle.
March 11, 2025 at 10:52 AM
5/13 This formulation offers major advantages:
- Can use image pairs with ANY degree of overlap
- Provides interpretable outputs showing the model's 3D understanding
- Better aligns with downstream binocular vision tasks
March 11, 2025 at 10:52 AM
4/13 The key insight: For each pixel in one image, we explicitly predict whether it is:
- Co-visible in the second image
- Occluded in the second image
- Outside the field of view in the second image
March 11, 2025 at 10:52 AM
3/13 We introduce Alligat0R, which reformulates this problem as a co-visibility segmentation task instead of trying to reconstruct masked regions of images.
March 11, 2025 at 10:52 AM
2/13 Current methods for training vision models to understand 3D relationships between images (like CroCo) require substantial overlap (>50%) between training image pairs. This limits their effectiveness in many real-world scenarios.
March 11, 2025 at 10:52 AM