Bilawal Sidhu
banner
bilawal.bsky.social
Bilawal Sidhu
@bilawal.bsky.social
🪄 Blending Realities | 🎙️ Host, TED AI Show | 🚀 Scout, A16z | 🎬 1.4M+ Subs & 450M+ Views | 🌎 Ex-Google PM, 3D Maps & AR/VR 🥽 https://spatialintelligence.ai https://bilawal.ai
It's one of those through lines when tackling a timeless mission like mapping the world or spatial computing - VR content created for immersion becoming the foundation for teaching machines to understand how the world moves. Sometimes innovation chains together in unexpected ways! stereo4d.github.io
Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos
Use stereo videos from the internet to create a dataset of over 100,000 real-world 4D scenes with metric scale and long-term 3D motion trajectories.
stereo4d.github.io
December 15, 2024 at 2:29 PM
And given we're dealing with real stereoscopic content, results are notably better than synthetic data, giving you a faithful rendition of the real-world with a diverse set of subject matter.
December 15, 2024 at 2:29 PM
They're using it to train this model called DynaDUSt3R that can predict both 3D structure and motion from video frames. Which means it tracks how objects move between frames while simultaneously reconstructing their 3D shape.
December 15, 2024 at 2:29 PM
It was always clear that stereo datasets would be valuable -- and we launched some cool VR tools with it back in 2017 (link below). But the game changer now in 2024 is the scale -- they're providing 110K clips :-) That's the kind of massive, real-world dataset that was just a dream in those days!
December 15, 2024 at 2:29 PM
5. Image to video (remix) feature is cool, but CLEARLY needs UI like Kling/Runway motion paint so it isn’t a chaotic mess / constant game of slot machine AI

Will be interesting to do head to head comparisons with US and Chinese models Sora goes live.
December 9, 2024 at 4:52 PM
3. Physics still very wonky (no magic fix yet) – rhino is moving all across the ground; phones appear/disappear like it’s a magician

4. Wow is there a lot of news footage in the training data – generated night time grainy footage is no problem at all
December 9, 2024 at 4:52 PM
1. Sora is VERY good at generating high frequency detail (video doesn’t seem blurry at all) – it’s the most impressive quality to me

2. As expected, Sora is great at well imaged landmarks – AI’s ability to generate custom “stock” footage remains promising
December 9, 2024 at 4:52 PM
Very cool! Would love to see a workflow breakdown
December 5, 2024 at 3:28 AM
The race for building the biggest, baddest world model is very much on. Meanwhile, all I can think is "if only Stadia was still around!"

Check out the various results (and some fun outtakes) below: deepmind.google/discover/blo...
Genie 2: A large-scale foundation world model
Generating unlimited diverse training environments for future general agents
deepmind.google
December 4, 2024 at 5:07 PM
Not quite ready for prime time, but promising on two fronts:

1. For game developers: enabling rapid prototyping of interactive experiences straight from concept art

2. For AI research: providing unlimited, diverse 3D environments for training and testing AI agents
December 4, 2024 at 5:07 PM
Right now Genie 2 can generate consistent worlds for up to a minute. And this world model seems to generate larger 3D worlds than what World Labs showcased yesterday. Plus they're dynamic vs. static worlds – the foliage moves in the wind, the water ripples etc.
December 4, 2024 at 5:07 PM
It's the same reason people browse Zillow houses or watch shows about mansions. AI or not — software reviews simply don't hit the same.
December 4, 2024 at 1:16 AM