Christian Wolf
banner
chriswolfvision.bsky.social
Christian Wolf
@chriswolfvision.bsky.social
Principal Scientist at Naver Labs Europe, Lead of Spatial AI team. AI for Robotics, Computer Vision, Machine Learning. Austrian in France. https://chriswolfvision.github.io/www/
Je suis tombé sur un ancien post fait sur l'autre site quand j'avais un moment d' .... inspiration ...
November 12, 2025 at 5:39 PM
Found on the site which must not be named. #ICLR2025
November 12, 2025 at 12:37 PM
You are not the only one who had that thought, @fguney.bsky.social seems to share this 😂 (and me too).

This should also be called "l'appel de la catastrophe" ...
November 12, 2025 at 7:04 AM
4. I also plug-in some of our own Figures mixing 3D vs 2D:

arxiv.org/abs/2507.01667
openaccess.thecvf.com/content_cvpr...
arxiv.org/abs/2307.16710
November 10, 2025 at 7:10 PM
3. I also appreciate when 3D is mixed with 2D in a helpful way, as often stuff IS 3D: either because of 3D vision, or just because of different tensor dims.

arxiv.org/abs/2402.14817
(Cameras as rays by CMU, ICLR 2024)
November 10, 2025 at 7:10 PM
2. Colors: that's surely subjective, but I think you can't go wrong with pastel, it's the least aggressive.

arxiv.org/abs/2504.14151
(Nice paper by FAIR/Meta, but I think the Figure could have had some more details on where Q,K,Vs go)
November 10, 2025 at 7:10 PM
So, I like a couple of things.

1. When architecture information is combined with a Figure showing what actually happens with the data:
November 10, 2025 at 7:10 PM
A new model for human mesh recovery, high-performing and w/o using any 3D scans, has been published by my excellent colleagues at @naverlabseurope.bsky.social. Excellent work!
November 6, 2025 at 11:37 AM
Early morning train rides often provide spectacular views.
(between Lyon and Grenoble).
November 4, 2025 at 7:45 AM
Well played, #NeurIPS2025 video recording software (SlideLive)
😂
November 3, 2025 at 6:01 PM
The arxiv link: arxiv.org/abs/2510.26443
October 31, 2025 at 12:10 PM
PointSt3R: Point Tracking through 3D Grounded Correspondence

R. Guerrier, @adamharley.bsky.social, @dimadamen.bsky.social
Bristol/Meta

rhodriguerrier.github.io/PointSt3R/
October 31, 2025 at 9:22 AM
For a computer vision researcher this is pure gold.
October 28, 2025 at 2:44 PM
The first movie projector by the Lumière brothers in Lyon (musée lumière à Lyon).
October 28, 2025 at 2:15 PM
The final boss of our house.
October 28, 2025 at 1:05 PM
This it the first time I see Aphantasia (the inability of ~3% of people to mentally visualize images) as an argument for CV and AI.

TLDR: the authors suggest that rendering and physics simulation are not dissociated; both are necessary for world models.

arxiv.org/abs/2510.208...
Luo et. al
October 27, 2025 at 9:59 PM
Distill transformers into Mamba by learning MLP projections between the transformer's Q,K weights and Mamba's B,C matrices.

arxiv.org/abs/2510.19266
Wang et al., NUS, UTA
October 27, 2025 at 12:24 PM
Midway networks are cool: representation learning of motion and reconstruction jointly. I see similar motivation in V-JEPA 2 "AC", but I really like the execution here:
- hierarchical,
- backwards features with cross-attention.

arxiv.org/abs/2510.05558
C. Hoang, @mengyer.bsky.social
NYU
October 27, 2025 at 12:12 PM
Delving into off-policy RL algorithms.
Today: the Ripley buffer.

(Repost from the old site from Oct. 2024, but I just rewatched Aliens for the probably 15th or so time)
October 26, 2025 at 9:23 PM
I suspect that the public outcry over data centers powered by gas turbines would be far bigger if more people knew what they actually are: jet engines driving a shaft (and then, a generator) instead of pushing an aircraft.

Imagine a 747 at full thrust 24h/day in your neighborhood.
October 25, 2025 at 10:16 AM
A PhD student starts a new project with multiple supervisors.
October 24, 2025 at 4:32 PM
Kinaema is trained with pose supervision and optionally with masked image modelling.

Interestingly, even from training with relative pose only, we can show through probing experiments that occupancy maps are encoded in the maintained memory.

9/9
October 24, 2025 at 7:18 AM
Kinaema memory is composed of a set of multiple embeddings, and the attention of the same scene patch to embeddings seems to follow a stable pattern.

8/9
October 24, 2025 at 7:18 AM
We outperform classical recurrent sequence models including other recurrent transformers.

We train on sequences of length T=100 and show generalization up to T=800 and T=1000, which we think is yet unheard of.

7/9
October 24, 2025 at 7:18 AM
On a very high-level, the difference between classical rel pose estimation and Kinaema is:

- relative pose estimation compares the camera poses of two images
- Kinaema estimates the relative pose between an image and agent memory, where memory holds the scene and the current agent position.

6/9
October 24, 2025 at 7:18 AM