valeo.ai
banner
valeoai.bsky.social
valeo.ai
@valeoai.bsky.social
We are a research team on artificial intelligence for automotive applications working toward assisted and autonomous driving.
--> https://valeoai.github.io/ <--
Pinned
1/🧡 Q: Can we have both a simple and SOTA architecture in autonomous driving?
R: Yes! 😍
Introducing Driving on Registers (DrivoR):
a pure Transformer backbone that achieves SOTA results in NAVSIM v1 / v2 and closed-loop HUGSIM evaluation.
Here is how πŸ‘‡
Reposted by valeo.ai
The unreasonable magic of simplicity!
Meet DrivoR (Driving on Registers): our latest end2end autonomous driving model.
We teared down complex dependencies & modules from current models to
obtain a pure Transformer-based SOTA driving agent (NAVSIM v1 & v2, HUGSIM).
Find out more πŸ‘‡
1/🧡 Q: Can we have both a simple and SOTA architecture in autonomous driving?
R: Yes! 😍
Introducing Driving on Registers (DrivoR):
a pure Transformer backbone that achieves SOTA results in NAVSIM v1 / v2 and closed-loop HUGSIM evaluation.
Here is how πŸ‘‡
January 9, 2026 at 5:02 PM
7/ πŸ“„ Read the paper & get the code: valeoai.github.io/driving-on-r...

Congratulations to the whole team!
January 9, 2026 at 5:00 PM
6/ Furthermore, this scoring architecture allowed us to tweak the agent's behavior.

We were able to induce a more passive, safer driving styleβ€”which proved important for reaching SOTA performance on the rigorous NAVSIM-v2 benchmark. πŸ›‘οΈ
January 9, 2026 at 4:57 PM
5/ Given the success of trajectory scoring methods (like GTRS), we dove deep into the scoring module.
Thanks to the wizardry of Yihong Xu, we discovered that disentangling the tokens used for generation from those used for scoring was key.
January 9, 2026 at 4:56 PM
4/ This mimics human driving intuition! 🧠
We pay max attention to the road ahead (front camera), while only occasionally glancing at the rear (back camera).
Visualizing the attention maps confirms this: front tokens specialize; back tokens collapse to a single pattern.
January 9, 2026 at 4:56 PM
3/ These registers act as "scene-tokens" and demonstrate signs of learned compression.
Cosine similarity analysis reveals high differentiation for the front camera, while representations progressively "collapse" as we move toward the back camera.
January 9, 2026 at 4:56 PM
2/ We explored specific reasons to use a pre-trained ViT as image encoder.
We imbue DINOv2 with registers LoRA-finetuned on driving data, reducing the # of patch tokens over 250x using camera aware register tokens.
This efficiency could impact future works on VLMs in driving
January 9, 2026 at 4:55 PM
1/🧡 Q: Can we have both a simple and SOTA architecture in autonomous driving?
R: Yes! 😍
Introducing Driving on Registers (DrivoR):
a pure Transformer backbone that achieves SOTA results in NAVSIM v1 / v2 and closed-loop HUGSIM evaluation.
Here is how πŸ‘‡
January 9, 2026 at 4:55 PM
Our @spyrosgidaris.bsky.social is speaking this morning (Wed, Dec 10th, 11:00 am Paris time) about "Latent Representations for Better Generative Image Modeling" in the Hi! PARIS - ELLIS monthly seminar.

The talk will be live-streamed: www.hi-paris.fr/2025/09/26/a...
AI Seminar Cycle – Hi! PARIS
www.hi-paris.fr
December 10, 2025 at 9:15 AM
Perfect timing for this keynote on open, re-purposable foundation models at #aiPULSE2025
@abursuc.bsky.social taking the stage this afternoon! πŸ‘‡
I'm speaking at #aiPULSE2025 today on Open & re-purposable foundation models for the automotive industry.
The morning keynotes talked a lot about open source so my slide here might be timely.
December 4, 2025 at 12:14 PM
Find out more about all these works at the posters, over a coffee or, if you’re shy, on our webpage: valeoai.github.io/posts/neurip...
valeo.ai at NeurIPS 2025 | valeo.ai - valeo.ai research page
LoΓ―ck Chambon, Spyros Gidaris, Andrei Bursuc, Eloi Zablocki
valeoai.github.io
December 3, 2025 at 10:52 PM
IPA: An Information-Preserving Input Projection Framework for Efficient Foundation Model Adaptation

by: Y. Yin, S. Venkataramanan, T.H. Vu, A. Bursuc, M. Cord
πŸ“„: arxiv.org/abs/2509.04398

tl;dr: a PEFT method that improves upon LoRA by explicitly preserving information in the low-rank space
December 3, 2025 at 10:52 PM
Multi-Token Prediction Needs Registers

by: A. Gerontopoulos, S. Gidaris, N. Komodakis
πŸ“„: arxiv.org/abs/2505.10518

tl;dr: a simple way to enable multi-token prediction in LLMs by interleaving learnable "register tokens" into the input sequence to forecast future targets.
December 3, 2025 at 10:51 PM
Boosting Generative Image Modeling via Joint Image-Feature Synthesis

by: T. Kouzelis, E. Karypidis, I. Kakogeorgiou, S. Gidaris, N. Komodakis
πŸ“„: arxiv.org/abs/2504.16064

- tl;dr: improve generation w/ a single diffusion model to jointly synthesize low-level latents & high-level semantic features
December 3, 2025 at 10:51 PM
Learning to Steer: Input-dependent Steering for Multimodal LLMs

by: J. Parekh, P. Khayatan, M. Shukor, A. Dapogny, A. Newson, M. Cord
πŸ“„: arxiv.org/abs/2508.12815

- tl;dr: steering multimodal LLMs (MLLMs) by training a lightweight auxiliary module to predict input-specific steering vectors
December 3, 2025 at 10:51 PM
DINO-Foresight: Looking into the Future with DINO

by E. Karypidis, I. Kakogeorgiou, S. Gidaris, N. Komodakis
πŸ“„: arxiv.org/abs/2412.11673

tl;dr: self-supervision by predicting future scene dynamics in the semantic feature space of foundation models (like DINO) rather than generating costly pixels.
December 3, 2025 at 10:50 PM
JAFAR: Jack up Any Feature at Any Resolution

by P. Couairon, L. Chambon, L. Serrano, M. Cord, N. Thome
πŸ“„: arxiv.org/abs/2506.11136

tl;dr: lightweight, flexible, plug & play upsampler that scales features from any vision foundation model to arbitrary resolutions w/o needing high-res supervision
December 3, 2025 at 10:50 PM
Check out our works at @NeurIPSConf #NeurIPS2025 this week!
We present 5 full papers + 1 workshop about:
πŸ’‘ self-supervised & representation learning
πŸ–ΌοΈ generative image models
🧠 finetuning and understanding LLMs & multimodal LLMs
πŸ”Ž feature upsampling

valeoai.github.io/posts/neurip...
December 3, 2025 at 10:50 PM
Reposted by valeo.ai
We fermented our thoughts on understanding LoRA & ended up with IPA🍺
We found an asymmetry in LoRA: during training, A changes little & B eats most task-specific adaptation.
So we pre-train A to preserve information before adaptation w/ excellent parameter efficiency #NeurIPS2025 #CCFM πŸ‘‡
1/Serve your PEFT with a fresh IPA!🍺
Finetuning large models is cheaper thanks to LoRA, but is its random init optimal?πŸ€”
Meet IPA: a feature-aware alternative to random projections
#NeurIPS2025 WS #CCFM Oral+Best Paper
Work w/
S. Venkataramanan @tuanhungvu.bsky.social @abursuc.bsky.social M. Cord
🧡
December 2, 2025 at 11:16 AM
Reposted by valeo.ai
1/Serve your PEFT with a fresh IPA!🍺
Finetuning large models is cheaper thanks to LoRA, but is its random init optimal?πŸ€”
Meet IPA: a feature-aware alternative to random projections
#NeurIPS2025 WS #CCFM Oral+Best Paper
Work w/
S. Venkataramanan @tuanhungvu.bsky.social @abursuc.bsky.social M. Cord
🧡
December 2, 2025 at 11:11 AM
Reposted by valeo.ai
That was a cool project brillantly led by Ellington Kirby during his internship.
We were curious if we could train diffusion models on sets of point coordinates.

For images, this is a step towards spatial diffusion, with pixels reorganizing themselves, instead of diffusing in rgb values space only.
LOGen: Toward Lidar Object Generation by Point Diffusion

by: E. Kirby, @mickaelchen.bsky.social, R. Marlet, N. Samet

tl;dr: a diffusion-based method producing lidar point clouds of dataset objects, with an extensive control of the generation

πŸ“„ arxiv.org/abs/2412.07385
Code: βœ…
November 26, 2025 at 1:19 PM
Reposted by valeo.ai
Check out NAF: an effective ViT feature upsampler to produce excellent (and eye-candy) pixel-level feature maps.

NAF outperform both VFM-specific upsamplers (FeatUp, JAFAR) and VFM-agnostic methods (JBU, AnyUp) over multiple downstream tasks πŸ‘‡
Need pixel-level features from your backbone (DINOv3, CLIP, RADIO, FRANCA...)?

πŸš€Introducing NAF: A universal, zero-shot feature upsampler.

It turns low-res ViT features into pixel-perfect maps.

-⚑ Model-agnostic
-πŸ₯‡ SoTA results
-πŸš€ 4Γ— faster than SoTA
-πŸ“ˆ Scales up to 2K res
November 25, 2025 at 6:36 PM
πŸ“’ NAF is fully open-source!

The repo contains:
βœ… Pretrained model
βœ… Example notebooks
βœ… Evaluation and training codes

Check it out & ⭐ the repo: github.com/valeoai/NAF
November 25, 2025 at 10:44 AM
πŸ› οΈ Already have a complex, pre-trained pipeline?
If you are using bilinear interpolation anywhere, NAF acts as a strict drop-in replacement.

Just swap it in. No retraining required. It’s literally free points for your metrics.πŸ“ˆ
November 25, 2025 at 10:44 AM