Juan Carlos Niebles
jcniebles.bsky.social
Juan Carlos Niebles
@jcniebles.bsky.social
Computer Vision, MultiModal AI Agents, Video AI
Research Director at salesforceairesearch.com
Adjunct Professor at cs.stanford.edu & svl.stanford.edu
🔗 www.niebles.net
Talk is done!

Shared our work on Multimodal AI Agents at the #ICCV2025 Workshop on Multi-Modal Reasoning. 🤖

All the slides, key papers, and the research journey are consolidated in this new blog post:

📄https://www.niebles.net/blog/2025/mmagents/

@iccv.bsky.social
October 21, 2025 at 12:49 AM
Congrats Chaitanya on winning the BEST PAPER AWARD 🥇 🏆

Check out details of our work:

arxiv.org/abs/2504.12513
June 12, 2025 at 9:07 PM
Our first #cvpr2025 poster is up!

🕐Come check it out right now until 13:00

“AdaVid: Adaptive Video-Language Pretraining”

🪧ExHall D Poster # 203

📝 arxiv.org/abs/2504.12513
June 12, 2025 at 5:01 PM
Just finished a day at the #CVPR2025 Area Chair workshop. Lots of interesting discussions and ideas, reconnection with colleagues and friends.

Had the chance to present our ViUnit poster to fellow ACs. If you missed it, come to our Sunday poster session.

See details in the 🧵⬇️
June 11, 2025 at 2:17 AM
Just dropped a new blog post: "Level up your Agents: Teaching Vision-Language Models to Play by the Rules"! We're exploring how to make Vision-Language Models (VLMs) even smarter at interactive tasks.

blog: www.niebles.net/blog/2025/vl...

arxiv: arxiv.org/abs/2505.03181
#multimodalAI #agents #VLM
June 4, 2025 at 7:44 PM
New blog post: "Are your Visual Programs Right for the Wrong Reasons?" 🤔

Dive into the motivation behind our @cvprconference.bsky.social #CVPR2025 paper!

📰 Blog: www.niebles.net/blog/2025/vi...
➡️ Project: artemisp.github.io/viunit/
📄 Paper: arxiv.org/abs/2412.08859

Work by Artemis P & Honglu Z
April 18, 2025 at 9:08 PM
New blog post: "Are your Visual Programs Right for the Wrong Reasons?" 🤔

Dive into the motivation behind our @cvprconference.bsky.social #CVPR2025 paper!

📰 Blog: niebles.net/blog/2025/vi...
➡️ Project: artemisp.github.io/viunit/
📄 Paper: arxiv.org/abs/2412.08859

Work by Artemis P & Honglu Z
April 18, 2025 at 5:39 PM
Workshop days are always the most engaging and rewarding. Here are my two picks for this weekend:

Saturday
Video-language
video-and-language-workshop-2024.webflow.io

Sunday
Multimodal algorithmic reasoning
marworkshop.github.io/neurips24/

Do you have other recommendations?
#NeurIPS #NeurIPS2024
December 14, 2024 at 1:39 AM
It’s on! Join us
#NeurIPS2024 #NeurIPS

📍 West Ballroom A-D #5106
🗓 4:30 p.m.

📄Arxiv: arxiv.org/abs/2412.03567
💭Blog: www.niebles.net/blog/2024/sdqes/
🌎Website: sdqesdataset.github.io
December 14, 2024 at 12:48 AM
Join us at our API-Gen poster!
#NeurIPS2024 #NeurIPS

📍West Ballroom A-D #5307
📝 arxiv.org/abs/2406.18518
🌎 apigen-pipeline.github.io
December 12, 2024 at 7:46 PM
To learn more, please visit our #NeurIPS2024 poster in Vancouver

📍 West Ballroom A-D #5106
🗓Fri 13 Dec 4:30 p.m.

In the meantime, check out the following resources:

📄Arxiv: arxiv.org/abs/2412.03567
💭Blog: www.niebles.net/blog/2024/sdqes/
🌎Website: sdqesdataset.github.io
December 6, 2024 at 6:33 PM
We found that existing methods struggle to solve this problem. We make the benchmark available to everyone and hope to see some interesting improvements soon.
December 6, 2024 at 6:33 PM
We also studied the performance of various streaming architectures, including several adapter layers to aggregate temporal information.
December 6, 2024 at 6:33 PM
For the dataset, we leverage Ego4D videos and annotations, and augment them with an automated pipeline to generate data samples for our task.
December 6, 2024 at 6:33 PM
For this to work, the smart glass starts capturing video only after receiving a user prompt and it would process the video on-device in a streaming fashion, looking to detect the start of the event indicated by the user.

Our paper introduces a benchmark to study this task.
December 6, 2024 at 6:33 PM
🤖 “Hey assistant, don't let me forget my card at the ATM!”

🕶 Are you as forgetful as I am? Imagine you could ask your smart glasses to give you a reminder the next time a certain event happens… wouldn’t that be amazing?
December 6, 2024 at 6:33 PM
As countries across the globe have increasingly prioritized AI, year-over-year rankings in AI leadership have changed.
November 22, 2024 at 5:03 PM
The tool ranks 36 countries using 42 AI-specific indicators. It is among the most comprehensive indices of its kind. A key finding: the U.S. currently leads in AI, followed by China in a distant second and the United Kingdom in third.
November 22, 2024 at 5:03 PM