Sjoerd van Steenkiste
banner
svansteenkiste.bsky.social
Sjoerd van Steenkiste
@svansteenkiste.bsky.social
Researching AI models that can make sense of the world @GoogleAI. Gemini Thinking.
Our team @GoogleAI is hiring an intern. We are interested in having LMs understand and respond to users better. Topics include: teaching LMs to build “mental models” of users; improving LM's reasoning capability over long contexts.

@GoogleAI internship deadline is Feb 28.
February 27, 2025 at 6:10 PM
Can language models perform implicit Bayesian inference over user preference states? Come find out at the “System-2 Reasoning at Scale” #NeurIPS2024 workshop, 11:30pm West Ballroom B.
December 15, 2024 at 6:36 PM
Neural Assets poster is happening now. Join us at East Exhibit Hall A-C #1507
December 12, 2024 at 7:15 PM
Excited to be at #NeurIPS2024. A few papers we are presenting this week:

MooG: arxiv.org/abs/2411.05927
Neural Assets: arxiv.org/abs/2406.09292
Probabilistic reasoning in LMs: openreview.net/forum?id=arYXg…

Let’s connect if any of these research topics interest you!
December 11, 2024 at 12:54 AM
Even in comparison to specialized architectures for down-stream tasks, such as TAPIR for point-tracking, we find that self-supervised MooG latents yield strong performance.
November 20, 2024 at 6:04 PM
MooG can provide a strong foundation for different downstream vision tasks, including point tracking, monocular depth estimation, and object tracking. Especially when reading out from frozen representations, MooG tends to outperform on-the-grid baselines.
November 20, 2024 at 6:04 PM
We demonstrate the usefulness of MooG’s learned representation both qualitatively and quantitatively by training readouts on top of the learned representation on a variety of downstream tasks.
November 20, 2024 at 6:04 PM
Inspired by prior methods using slots or queries, MooG uses cross-attention to disentangle the representation structure and image structure. Combined with a next frame prediction loss, this results in latent tokens that bind to specific scene structures and track them as they move.
November 20, 2024 at 6:04 PM
MooG is a self-supervised video representation model that combines transformers and recurrence to update latent tokens in a stage-wise manner.
November 20, 2024 at 6:04 PM
Excited to announce MooG for learning video representations. MooG allows tokens to move “off-the-grid” enabling better representation of scene elements, even as they move across the image plane through time.

📜https://arxiv.org/abs/2411.05927
🌐https://moog-paper.github.io/
November 20, 2024 at 6:04 PM