Lightnews — Scholar-powered news

Reposted by Aleksander Hołyński

carldoersch.bsky.social

@carldoersch.bsky.social

We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: tap-next.github.io

April 9, 2025 at 2:04 PM

Reposted by Aleksander Hołyński

zhengqili.bsky.social

@zhengqili.bsky.social

Introducing MegaSaM!

Accurate, fast, & robust structure + camera estimation from casual monocular videos of dynamic scenes!

MegaSaM outputs camera parameters and consistent video depth, scaling to long videos with unconstrained camera paths and complex scene dynamics!

December 6, 2024 at 5:43 PM

Aleksander Hołyński

@holynski.bsky.social

I love SfM, but it's way less useful than it should be because of a handful of characteristic failures.

@zhengqi_li's new paper basically solves them all:

-No parallax? ✅
-No calibration? ✅
-Dynamic scenes? ✅
-Dense geometry? ✅

Best of all, it's super fast.

zhengqili.bsky.social @zhengqili.bsky.social · Dec 6

Introducing MegaSaM!

Accurate, fast, & robust structure + camera estimation from casual monocular videos of dynamic scenes!

MegaSaM outputs camera parameters and consistent video depth, scaling to long videos with unconstrained camera paths and complex scene dynamics!

December 6, 2024 at 6:36 PM

Reposted by Aleksander Hołyński

Ziyang Chen

@czyang.bsky.social

🎥 Introducing MultiFoley, a video-aware audio generation method with multimodal controls! 🔊
We can
⌨️Make a typewriter sound like a piano 🎹
🐱Make a cat meow like a lion roars! 🦁
⏱️Perfectly time existing SFX 💥 to a video.

arXiv: arxiv.org/abs/2411.17698
website: ificl.github.io/MultiFoley/

November 27, 2024 at 2:58 AM

Reposted by Aleksander Hołyński

Clément Godard

@mrharicot.bsky.social

Quark is out!
Come check our work on generalized realtime 3D reconstruction.
quark-3d.github.io

PS: We're looking for interns!

November 28, 2024 at 3:45 PM

Reposted by Aleksander Hołyński

Ben Poole

@benmpoole.bsky.social

Stop watching videos, start interacting with worlds.

Stoked to share CAT4D, our new method for turning videos into dynamic 3D scenes that you can move through in real-time!
cat-4d.github.io
arxiv.org/abs/2411.18613

November 28, 2024 at 2:52 AM

Aleksander Hołyński

@holynski.bsky.social

Check out CAT4D: our new paper that turns (text, sparse images, videos) => (dynamic 3D scenes)!

I can't get over how cool the interactive demo is.

Try it out for yourself on the project page: cat-4d.github.io

November 28, 2024 at 2:52 AM

Reposted by Aleksander Hołyński

Jon Barron

@jonbarron.bsky.social

We just dropped CAT4D, text to dynamic 3D models that you can render in real time. Not posting a video because Bluesky is garbage in this respect; go straight to the real time viewer on a desktop browser and look around. The cat kneading dough is my favorite.
cat-4d.github.io

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video. CAT4D leverages a multi-view video diffusion model trained on a diverse combination of datasets to enable novel vie...

cat-4d.github.io

November 28, 2024 at 2:50 AM

Reposted by Aleksander Hołyński

Jon Barron

@jonbarron.bsky.social

Our group at Google DeepMind is now accepting intern applications for summer 2025. Attached is the official "call for interns" email; the links and email aliases that got lost in the screenshot are below.

November 25, 2024 at 9:55 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news