Lightnews — Scholar-powered news

Reposted by Stefan Baumann

Ai2

@ai2.bsky.social

Last year Molmo set SOTA on image benchmarks + pioneered image pointing. Millions of downloads later, Molmo 2 brings Molmo’s grounded multimodal capabilities to video 🎥—and leads many open models on challenging industry video benchmarks. 🧵

December 16, 2025 at 4:52 PM

Reposted by Stefan Baumann

Johan Edstedt

@parskatt.bsky.social

Oof

November 28, 2025 at 2:52 PM

Reposted by Stefan Baumann

Johan Edstedt

@parskatt.bsky.social

RoMa v2 is now out! (github.com/Parskatt/rom..., arxiv.org/abs/2511.15706)

Here are the main improvements we made since RoMa:

November 20, 2025 at 9:25 AM

Reposted by Stefan Baumann

CompVis - Computer Vision and Learning LMU Munich

@compvis.bsky.social

Excited to share that we'll be presenting four papers at the main conference at ICCV 2025 this week!

Come say hi in Honolulu!

👋 Pingchuan, Ming, Felix, Stefan, Timy, and Björn Ommer will be attending.

October 19, 2025 at 6:06 PM

Reposted by Stefan Baumann

Johannes Schusterbauer

@joh-schb.bsky.social

🤔 What if you could generate an entire image using just one continuous token?

💡 It works if we leverage a self-supervised representation!

Meet RepTok🦎: A generative model that encodes an image into a single continuous latent while keeping realism and semantics. 🧵 👇

October 17, 2025 at 10:21 AM

Stefan Baumann

@stefanabaumann.bsky.social

🤔 What happens when you poke a scene — and your model has to predict how the world moves in response?

We built the Flow Poke Transformer (FPT) to model multi-modal scene dynamics from sparse interactions.

It learns to predict the 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 of motion itself 🧵👇

October 15, 2025 at 1:56 AM

Reposted by Stefan Baumann

Keenan Crane

@keenancrane.bsky.social

“Everyone knows” what an autoencoder is… but there's an important complementary picture missing from most introductory material.

In short: we emphasize how autoencoders are implemented—but not always what they represent (and some of the implications of that representation).🧵

September 6, 2025 at 9:20 PM

Stefan Baumann

@stefanabaumann.bsky.social

I'm calling it now, GSPO will be the next big hype in LLM RL algos after GRPO.

It makes so much more sense intuitively to work on a sequence rather than on a token level when our rewards are on a sequence level.

July 26, 2025 at 7:41 PM

Reposted by Stefan Baumann

CompVis - Computer Vision and Learning LMU Munich

@compvis.bsky.social

🎉 Excited to share that our lab has three papers accepted at CVPR 2025!

Come say hi in Nashville!
👋 Johannes, Ming, Kolja, Stefan, and Björn will be attending.

June 9, 2025 at 7:28 AM

Reposted by Stefan Baumann

Johannes Schusterbauer

@joh-schb.bsky.social

If you are interested, feel free to check the paper (arxiv.org/abs/2506.02221) or come by at CVPR:

📌 Poster Session 6, Sunday 4:00 to 6:00 PM, Poster #208

Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment

Diffusion models have revolutionized generative tasks through high-fidelity outputs, yet flow matching (FM) offers faster inference and empirical performance gains. However, current foundation FM mode...

arxiv.org

June 6, 2025 at 3:48 PM

Reposted by Stefan Baumann

Sander Dieleman

@sedielem.bsky.social

Here's the third and final part of Slater Stich's "History of diffusion" interview series!

The other two interviewees' research played a pivotal role in the rise of diffusion models, whereas I just like to yap about them 😬 this was a wonderful opportunity to do exactly that!

History of Diffusion - Sander Dieleman

YouTube video by Bain Capital Ventures

www.youtube.com

May 14, 2025 at 4:11 PM

Reposted by Stefan Baumann

Kosta Derpanis

@csprofkgd.bsky.social

#KostasThoughts: Another major conference review drop is around the corner. In baseball, a .300 average is elite. In research, it’s a familiar reality: submitting to top conferences means rejections happen. Keep swinging!

May 7, 2025 at 6:16 PM

Reposted by Stefan Baumann

Luca Ambrogioni

@lucamb.bsky.social

I am very happy to share our latest work on the information theory of generative diffusion:

"Entropic Time Schedulers for Generative Diffusion Models"

We find that the conditional entropy offers a natural data-dependent notion of time during generation

Link: arxiv.org/abs/2504.13612

April 29, 2025 at 1:17 PM

Reposted by Stefan Baumann

Sander Dieleman

@sedielem.bsky.social

New blog post: let's talk about latents!
sander.ai/2025/04/15/l...

Generative modelling in latent space

Latent representations for generative models.

sander.ai

April 15, 2025 at 9:40 AM

Reposted by Stefan Baumann

Damien Teney

@damienteney.bsky.social

And the CVPR oral decisions are out! (on Openreview)

April 4, 2025 at 3:25 PM

Reposted by Stefan Baumann

Jianyuan Wang

@jianyuanwang.bsky.social

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds!

Project Page: vgg-t.github.io
Code & Weights: github.com/facebookrese...

March 17, 2025 at 2:08 AM

Reposted by Stefan Baumann

Johan Edstedt

@parskatt.bsky.social

Introducing DaD (arxiv.org/abs/2503.07347), a pretty cool keypoint detector.
As this will get pretty long, this will be two threads.
The first will go into the RL part, and the second on the emergence and distillation.

March 11, 2025 at 3:05 AM

Reposted by Stefan Baumann

Kosta Derpanis

@csprofkgd.bsky.social

The fate of your #CVPR2025 submission

February 15, 2025 at 7:45 PM

Reposted by Stefan Baumann

Pingchuan Ma

@pima-hyphen.bsky.social

🤔When combining Vision-language models (VLMs) with Large language models (LLMs), do VLMs benefit from additional genuine semantics or artificial augmentations of the text for downstream tasks?

🤨Interested? Check out our latest work at #AAAI25:

💻Code and 📝Paper at: github.com/CompVis/DisCLIP

🧵👇

January 8, 2025 at 3:54 PM

Reposted by Stefan Baumann

Klara Janouskova

@klara-cz.bsky.social

What I like to do when considering a new dataset is to train a simple classifier and look at 'the most confident errors'.
Recently with NICO: Apart from a class, the images have a context, one of them is 'autumn'. There is also a pumpkin class. Surprise surprise, many autumn images contain pumpkins.

December 10, 2024 at 7:02 PM

Reposted by Stefan Baumann

Neil Renic

@ncrenic.bsky.social

Just had an idea

December 10, 2024 at 9:44 AM

Reposted by Stefan Baumann

Jia-Bin Huang

@jbhuang0604.bsky.social

How to schedule a meeting?

When you ask for a meeting with others, you are asking for their time. You are asking for their most valuable, finite resource to benefit yourself (e.g., for advice, networking, questions, and opportunities).

Here are some tips that I found useful.

December 10, 2024 at 12:02 AM

Stefan Baumann

@stefanabaumann.bsky.social

Do you like the power of diffusion features for semantic correspondence but dread running an expensive ~1B model to get them?

What if you could have even better features at a fraction of the cost? If this sounds enticing, take a look at this paper! ⬇️

Frank Fundel @frankfundel.bsky.social · Dec 6

Did you know you can distill the capabilities of a large diffusion model into a small ViT? ⚗️
We showed exactly that for a fundamental task:
semantic correspondence📍

A thread 🧵👇

December 6, 2024 at 4:36 PM

Stefan Baumann

@stefanabaumann.bsky.social

Ever wondered if diffusion features could do better without all the noise? 🤔

Turns out they can! We show how adapting the backbone unlocks clean, powerful features for better results across the board. 🚀🧹

Check it out! ⬇️

Nick Stracke @rmsnorm.bsky.social · Dec 4

🤔 Why do we extract diffusion features from noisy images? Isn’t that destroying information?

Yes, it is - but we found a way to do better. 🚀

Here’s how we unlock better features, no noise, no hassle.

📝 Project Page: compvis.github.io/cleandift
💻 Code: github.com/CompVis/clea...

🧵👇

December 4, 2024 at 11:47 PM

Reposted by Stefan Baumann

ruiqigao.bsky.social

@ruiqigao.bsky.social

Blog post link: diffusionflow.github.io/

Despite seeming similar, there is some confusion in the community about the exact connection between the two frameworks. We aim to clear up the confusion by showing how to convert one framework to another, for both training and sampling.

Diffusion Meets Flow Matching

Flow matching and diffusion models are two popular frameworks in generative modeling. Despite seeming similar, there is some confusion in the community about their exact connection. In this post, we a...

diffusionflow.github.io

December 2, 2024 at 6:45 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news