Lightnews — Scholar-powered news

Stefan Baumann

@stefanabaumann.bsky.social

1.3K followers 650 following 91 posts

PhD Student at @compvis.bsky.social & @ellis.eu working on generative computer vision.

Interested in extracting world understanding from models and more controlled generation. 🌐 https://stefan-baumann.eu/

Posts Replies Media Videos

Stefan Baumann

@stefanabaumann.bsky.social

Classic case of xkcd 2501

October 17, 2025 at 4:58 PM

Stefan Baumann

@stefanabaumann.bsky.social

⚡️ FPT generalizes from open-set training. Applications:
• Articulated motion (Drag-A-Move): fine-tuned FPT outperforms specialized models for motion prediction
• Face motion: zero-shot, beats specialized baselines
• Moving part segmentation: emerges from formulation

October 15, 2025 at 1:58 AM

Stefan Baumann

@stefanabaumann.bsky.social

⚙️ Unlike other methods, we don't regress or sample one trajectory.
FPT 𝘳𝘦𝘱𝘳𝘦𝘴𝘦𝘯𝘵𝘴 𝘵𝘩𝘦 𝘧𝘶𝘭𝘭 𝘮𝘰𝘵𝘪𝘰𝘯 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯, enabling:
• interpretable uncertainty
• controllable interaction effects
• efficient prediction (>100k predictions/s)

October 15, 2025 at 1:57 AM

Stefan Baumann

@stefanabaumann.bsky.social

💡 Our idea:
Predict 𝗱𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀 of motion, not just one flow field instance.

Given a few pokes, our model outputs the probability 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 of how parts of the scene might move.

→ This directly captures 𝘶𝘯𝘤𝘦𝘳𝘵𝘢𝘪𝘯𝘵𝘺 and interactions.

October 15, 2025 at 1:57 AM

Stefan Baumann

@stefanabaumann.bsky.social

🧠 Understanding how the world 𝘤𝘰𝘶𝘭𝘥 change is core to physical intelligence.

But most models predict 𝗼𝗻𝗲 𝗳𝘂𝘁𝘂𝗿𝗲, a single deterministic motion.

The reality is 𝘶𝘯𝘤𝘦𝘳𝘵𝘢𝘪𝘯 and 𝘮𝘶𝘭𝘵𝘪-𝘮𝘰𝘥𝘢𝘭: one poke can lead to many outcomes.

October 15, 2025 at 1:57 AM

Stefan Baumann

@stefanabaumann.bsky.social

🤔 What happens when you poke a scene — and your model has to predict how the world moves in response?

We built the Flow Poke Transformer (FPT) to model multi-modal scene dynamics from sparse interactions.

It learns to predict the 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 of motion itself 🧵👇

October 15, 2025 at 1:56 AM

Stefan Baumann

@stefanabaumann.bsky.social

tl;dr: do importance weighting/sampling on a sequence level, not a token level.
Makes everything behave much better (see below) and makes more sense from a theoretical perspective, too.

Paper: www.arxiv.org/abs/2507.18071

July 26, 2025 at 7:43 PM

Stefan Baumann

@stefanabaumann.bsky.social

I'm calling it now, GSPO will be the next big hype in LLM RL algos after GRPO.

It makes so much more sense intuitively to work on a sequence rather than on a token level when our rewards are on a sequence level.

July 26, 2025 at 7:41 PM

Stefan Baumann

@stefanabaumann.bsky.social

Absolutely 100% this. Who would want to read papers like VGG-T

July 25, 2025 at 9:41 AM

Stefan Baumann

@stefanabaumann.bsky.social

What a PhD, fantastic work! Two incredible banger papers already, and now this one. And all that in actually three years. Really looking forward to what you'll be up to next!

February 14, 2025 at 10:56 PM

Stefan Baumann

@stefanabaumann.bsky.social

My i's, it i's

December 4, 2024 at 10:28 AM

Stefan Baumann

@stefanabaumann.bsky.social

sup

November 24, 2024 at 6:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news