Mrinal Verghese
mrinal-verghese.bsky.social
Mrinal Verghese
@mrinal-verghese.bsky.social
PhD student at Carnegie Mellon Robotics Institute.
I work on task learning for household robots.
He/Him.
http://mrinal.verghese.org
5/ 2) Grounding errors, where the LLM fails to recognize previously completed actions or suggests actions for a different variation of the task, are the dominant error modes. We can make progress in this domain by better enabling LLMs to attend to long visual histories.
February 23, 2025 at 10:07 PM
4/ 1) Encoding the visual task history using the Socratic approach is more effective than representing this info implicitly using VCLMs. Implicit representations capture “low-level” info, which is less useful for planning than the “high-level” info in explicit text representations.
February 23, 2025 at 10:07 PM
3/ We set up a user study where users would complete the first half of a task themselves while the LLM monitored their progress and then relied on the LLM to guide them through the rest of the task.
We came away with three important findings:
February 23, 2025 at 10:07 PM
How well do Multimodal LLMs consider visual information when creating plans to complete household activities? To answer this, we put a few multimodal LLMs on a pair of smart glasses and had participants try to solve cooking tasks while taking instructions from them.
February 23, 2025 at 10:07 PM