Stefan Baumann
stefanabaumann.bsky.social
Stefan Baumann
@stefanabaumann.bsky.social
PhD Student at @compvis.bsky.social & @ellis.eu working on generative computer vision.

Interested in extracting world understanding from models and more controlled generation. 🌐 https://stefan-baumann.eu/
Classic case of xkcd 2501
October 17, 2025 at 4:58 PM
⚡️ FPT generalizes from open-set training. Applications:
• Articulated motion (Drag-A-Move): fine-tuned FPT outperforms specialized models for motion prediction
• Face motion: zero-shot, beats specialized baselines
• Moving part segmentation: emerges from formulation
October 15, 2025 at 1:58 AM
⚙️ Unlike other methods, we don't regress or sample one trajectory.
FPT 𝘳𝘦𝘱𝘳𝘦𝘴𝘦𝘯𝘵𝘴 𝘵𝘩𝘦 𝘧𝘶𝘭𝘭 𝘮𝘰𝘵𝘪𝘰𝘯 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯, enabling:
• interpretable uncertainty
• controllable interaction effects
• efficient prediction (>100k predictions/s)
October 15, 2025 at 1:57 AM
💡 Our idea:
Predict 𝗱𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀 of motion, not just one flow field instance.

Given a few pokes, our model outputs the probability 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 of how parts of the scene might move.

→ This directly captures 𝘶𝘯𝘤𝘦𝘳𝘵𝘢𝘪𝘯𝘵𝘺 and interactions.
October 15, 2025 at 1:57 AM
🧠 Understanding how the world 𝘤𝘰𝘶𝘭𝘥 change is core to physical intelligence.

But most models predict 𝗼𝗻𝗲 𝗳𝘂𝘁𝘂𝗿𝗲, a single deterministic motion.

The reality is 𝘶𝘯𝘤𝘦𝘳𝘵𝘢𝘪𝘯 and 𝘮𝘶𝘭𝘵𝘪-𝘮𝘰𝘥𝘢𝘭: one poke can lead to many outcomes.
October 15, 2025 at 1:57 AM
🤔 What happens when you poke a scene — and your model has to predict how the world moves in response?

We built the Flow Poke Transformer (FPT) to model multi-modal scene dynamics from sparse interactions.

It learns to predict the 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 of motion itself 🧵👇
October 15, 2025 at 1:56 AM
tl;dr: do importance weighting/sampling on a sequence level, not a token level.
Makes everything behave much better (see below) and makes more sense from a theoretical perspective, too.

Paper: www.arxiv.org/abs/2507.18071
July 26, 2025 at 7:43 PM
I'm calling it now, GSPO will be the next big hype in LLM RL algos after GRPO.

It makes so much more sense intuitively to work on a sequence rather than on a token level when our rewards are on a sequence level.
July 26, 2025 at 7:41 PM
Absolutely 100% this. Who would want to read papers like VGG-T
July 25, 2025 at 9:41 AM
What a PhD, fantastic work! Two incredible banger papers already, and now this one. And all that in actually three years. Really looking forward to what you'll be up to next!
February 14, 2025 at 10:56 PM
My i's, it i's
December 4, 2024 at 10:28 AM
sup
November 24, 2024 at 6:04 PM