Lightnews — Scholar-powered news

Roei Herzig

@roeiherz.bsky.social

410 followers 180 following 29 posts

Research Scientist @ IBM Research. Postdoc @ Berkeley AI. PhD @ Tel Aviv University. Working on Compositionality, Multimodal Foundation Models, and Structured Physical Intelligence.

🔗 https://roeiherz.github.io/
📍Bay area 🇺🇲

Posts Replies Media Videos

Roei Herzig

@roeiherz.bsky.social

We found that 4D representations maintain a shared geometric structure between the points and robot state representations up to a linear transformation, and thus enabling efficient transfer learning from human video data to low-level robotic control.

February 24, 2025 at 3:49 AM

Roei Herzig

@roeiherz.bsky.social

Our paper: arxiv.org/pdf/2502.13142.

Our project page and code will be released soon!

Team: \w Dantong Niu, Yuvan Sharma, Haoru Xue, Giscard Biamby, Junyi Zhang, Ziteng Ji, and Trevor Darrell.

February 24, 2025 at 3:49 AM

Roei Herzig

@roeiherz.bsky.social

What happens when vision🤝 robotics meet? 🚨 Happy to share our new work on Pretraining Robotic Foundational Models!🔥

ARM4R is an Autoregressive Robotic Model that leverages low-level 4D Representations learned from human video data to yield a better robotic model.

BerkeleyAI 😊

February 24, 2025 at 3:49 AM

Roei Herzig

@roeiherz.bsky.social

The best friend of Auto-regressive Robotic Models is 4D representations...🤖😻❤️

February 20, 2025 at 5:01 AM

Roei Herzig

@roeiherz.bsky.social

For all our @neuripsconf.bsky.social friends🤖🦋, our work is presented NOW at POSTER #3701.

Come hear us talk our work on many-shot in-context learning and test-time scaling by leveraging the activations! You won't be disappointed😎

#Multimodal-InContextLearning #NeurIPS

December 12, 2024 at 7:13 PM

Roei Herzig

@roeiherz.bsky.social

This fantastic work was done by the outstanding students, Brandon Huang, Chancharik Mitra and Tianning Chai, as well as Zhiqiu Lin, Assaf Arbelle, Rogerio Feris, Leonid Karlinsky.

I also want to special thanks the amazing Trevor Darrell and Deva Ramanan for their invaluable guidance.

December 4, 2024 at 9:24 PM

Roei Herzig

@roeiherz.bsky.social

We tried several different tasks, such as Safety, Visual Question Answering (VQA), and Classification benchmarks.

The results suggest that SAVs are particularly useful even when compared to LoRA (where there are not a lot of samples to fine-tune the model).

December 4, 2024 at 9:24 PM

Roei Herzig

@roeiherz.bsky.social

What we did? ->

We propose an algorithm for finding small sets of attention heads (~20!) as multimodal features in Generative LMMs that can be used for discriminative VL tasks, outperforming encoder-only architectures (CLIP, SigLIP) without training.

December 4, 2024 at 9:24 PM

Roei Herzig

@roeiherz.bsky.social

Hello World!

November 23, 2024 at 9:20 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news