Yi-Hao Peng
yihaopeng.bsky.social
Yi-Hao Peng
@yihaopeng.bsky.social
Reposted by Yi-Hao Peng
Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range. pleias.fr/blog/blogsyn...
November 10, 2025 at 5:30 PM
Reposted by Yi-Hao Peng
The first research on the fundamentals of character training -- i.e. applying modern post training techniques to ingrain specific character traits into models.

All models, datasets, code etc released.
Really excited about this project! Sharan, the lead student author, was a joy to work with.
November 4, 2025 at 4:51 PM
Reposted by Yi-Hao Peng
Great overview of the workshops and tutorials.
My favorites:
1) CAD representation
2) synthetic data to help city-scale reconstruction
3) trends in 3D vision
4) visual chain-of-thoughts?
November 3, 2025 at 1:06 PM
Reposted by Yi-Hao Peng
Instance-Level Composed Image Retrieval
@billpsomas.bsky.social George Retsinas @nikos-efth.bsky.social Panagiotis Filntisis,Yannis Avrithis, Petros Maragos, Ondrej Chum, @gtolias.bsky.social

tl;dr: condition-based retrieval (+dataset) - old photo/sunset/night/aerial/model arxiv.org/abs/2510.25387
November 3, 2025 at 12:53 PM
Reposted by Yi-Hao Peng
💡Can we trust synthetic data for statistical inference?

We show that synthetic data (e.g., LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moment residuals of synthetic data and those of real data
October 10, 2025 at 4:12 PM
Reposted by Yi-Hao Peng
We have a new sequence model for robotics, which will be presented at #NeurIPS2025:

Kinaema: A recurrent sequence model for memory and pose in motion
arxiv.org/abs/2510.20261

By @mbsariyildiz.bsky.social, @weinzaepfelp.bsky.social, G. Bono, G. Monaci and myself
@naverlabseurope.bsky.social

1/9
October 24, 2025 at 7:18 AM
Reposted by Yi-Hao Peng
Minimal Solvers and Model Refinement by @visionviktor

#ICCV2025
danini.github.io/ransac-2025-...
October 20, 2025 at 11:49 PM
Reposted by Yi-Hao Peng
🚨 Does your LLM really understand code -- or is it just really good at remembering it?
We built **PLSemanticsBench** to find out.
The results: a wild mix.

✅The Brilliant:
Top reasoning models can execute complex, fuzzer-generated programs -- even with 5+ levels of nested loops! 🤯

❌The Brittle: 🧵
October 14, 2025 at 2:33 AM
Reposted by Yi-Hao Peng
Barroso-Laguna et al., "A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features"

When contexting your feed-forward 3D point-map estimator, don't use full image pairs -- just randomly subsample! -> fast compute, more images.
October 2, 2025 at 7:37 PM
Reposted by Yi-Hao Peng
Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields

Zhiting Mei, Ola Shorinwa, Anirudha Majumdar
tl;dr: who cares, look at those dino icons!
OK, distilling DINO into NERF -> better object localization, than VGGT.

arxiv.org/abs/2510.03104
October 6, 2025 at 10:48 AM
Reposted by Yi-Hao Peng
Can we crowdsource robot evaluations? lmsys/chatbot arena helped revolutionize LLM evaluations; is it possible to do similar things for robots?

Find out in a new episode of RoboPapers: robopapers.substack.com/p/ep34-roboa...
Ep#34: RoboArena
With Pranav Atreya and Karl Pertsch
robopapers.substack.com
October 3, 2025 at 2:04 PM
Reposted by Yi-Hao Peng
If you've been trying to figure out DSPy - the automatic prompt optimization system - this talk by @dbreunig.bsky.social is the clearest explanation I've seen yet, with a very useful real-world case study www.youtube.com/watch?v=I9Zt...

My notes here: simonwillison.net/2025/Oct/4/d...
Let the LLM Write the Prompts: An Intro to DSPy in Compound AI Pipelines
YouTube video by Databricks
www.youtube.com
October 4, 2025 at 11:05 PM
Reposted by Yi-Hao Peng
Announcing a broad expansion of the National Deep Inference Fabric.

This could be relevant to your research...
September 26, 2025 at 6:47 PM
Reposted by Yi-Hao Peng
New technical post from Thinky on optimizers but this is the main catch: conditional learning rate per layers.

thinkingmachines.ai/blog/modular...
September 26, 2025 at 6:00 PM
Reposted by Yi-Hao Peng
Tendency: render non-visual data/parameters into images fed to neural networks.

Joint angles: GENIMA genima-robot.github.io
Trajectories: VINT arxiv.org/abs/2306.14846
Object pose: MFOS arxiv.org/abs/2310.01897
Waypoints: PIVOT arxiv.org/abs/2402.07872

(Repost+update of a tweet from 2024)
September 22, 2025 at 2:07 PM
Reposted by Yi-Hao Peng
Large Vision Models Can Solve Mental Rotation Problems

Sebastian Ray Mason, Anders Gjølbye, Phillip Chavarria Højbjerg, Lenka Tětková, Lars Kai Hansen

tl;dr: DINOv3,CLIP know 3D geometry, but only in middle layers. MAE bad, ImageNet bad, cls token bad.
arxiv.org/abs/2509.15271
September 22, 2025 at 10:19 AM
Reposted by Yi-Hao Peng
Why does AI sometimes fail to generalize, and what might help? In a new paper (arxiv.org/abs/2509.16189), we highlight the latent learning gap — which unifies findings from language modeling to agent navigation — and suggest that episodic memory complements parametric learning to bridge it. Thread:
Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences
When do machine learning systems fail to generalize, and what mechanisms could improve their generalization? Here, we draw inspiration from cognitive science to argue that one weakness of machine lear...
arxiv.org
September 22, 2025 at 4:21 AM
Reposted by Yi-Hao Peng
Explaining how to get the most of the CLI agents and why they're so important to understand today and tomorrow's AI progress. They're showing the new fundamentals of agents and how frontier labs will hill climb on open-ended tasks.
buff.ly/jAAhHnQ
Coding as the epicenter of AI progress and the path to general agents
GPT-5-Codex, adoption, denial, peak performance, and everyday gains.
www.interconnects.ai
September 18, 2025 at 3:28 PM
Reposted by Yi-Hao Peng
Towards the Next Generation of 3D Reconstruction

@parskatt.bsky.social PhD Thesis.

tl;dr: would be useful in teaching image matching - nice explanations. (too) Fancy and stylish notation. Cool Ack section and cover image.

liu.diva-portal.org/smash/record...
September 18, 2025 at 6:25 AM
Reposted by Yi-Hao Peng
Can we use video diffusion to generate 3D scenes?

𝐖𝐨𝐫𝐥𝐝𝐄𝐱𝐩𝐥𝐨𝐫𝐞𝐫 (#SIGGRAPHAsia25) creates fully-navigable scenes via autoregressive video generation.

Text input -> 3DGS scene output & interactive rendering!

🌍http://mschneider456.github.io/world-explorer/
📽️https://youtu.be/N6NJsNyiv6I
September 17, 2025 at 12:08 PM
Reposted by Yi-Hao Peng
I finally got around to making a tool to compare completions from SFT vs. RLHF trained models. This is a mini site for the RLHF book that I've wanted for a while.

buff.ly/lqDL5wa

It's always been hard to say what RLHF does to a model within a more complex post-training pipeline.
September 17, 2025 at 2:20 PM
Reposted by Yi-Hao Peng
Vision-Language-Action models are the foundation of a new wave of generalist robots: networks that take in images from robot cameras (vision) and instructions (language) and produce robot trajectories. We are seeing a remarkable convergence in how these work; more: open.substack.com/pub/itcanthi...
Vision-Language-Action Models and the Search for a Generalist Robot Policy
VLAs are general-purpose robotics models. But how are VLAs doing in the real world. and which ones are people using?
open.substack.com
August 27, 2025 at 11:42 PM
Reposted by Yi-Hao Peng
I'll make a longer post at some point but the tl;dr is:

We take a simple (but underappreciated) closed form solution for estimating H, and replace the homog part with whatever distortion model we have.

This gives us simpler and faster solutions across the board. Super cool work imo!
Radially Distorted Homographies, Revisited

Mårten Wadenbäck, Marcus Valtonen Örnhag, @parskatt.bsky.social

tl;dr: minimal solvers for one-sided/two-sided equal/two-sided independent radial distortion homography

arxiv.org/abs/2508.21190
September 1, 2025 at 9:59 AM
Reposted by Yi-Hao Peng
Our work on Detecting Suspense in Stories will appear at COLM'25 arxiv.org/abs/2508.15794

Do LLMs know when stories are suspenseful?

We ran LLMs through a bunch of classical psychology suspense studies

The answer: kinda, sorta? Honestly, better than I expected. But the results are nuanced...

1/
August 25, 2025 at 7:49 PM
Reposted by Yi-Hao Peng
🚨 New preprint!
How far can we go with ImageNet for Text-to-Image generation? w. @arrijitghosh.bsky.social @lucasdegeorge.bsky.social @nicolasdufour.bsky.social @vickykalogeiton.bsky.social
TL;DR: Train a text-to-image model using 1000 less data in 200 GPU hrs!

📜https://arxiv.org/abs/2502.21318
🧵👇
March 3, 2025 at 10:19 AM