Hao Zhu 朱昊
@zhuhao.me
AI researcher. Postdocing at Stanford NLP. Prev: PhD CMU LTI.
Visit https://zhuhao.me
Raising agents in the Opensocial.world
Visit https://zhuhao.me
Raising agents in the Opensocial.world
Pinned
Hao Zhu 朱昊
@zhuhao.me
· Mar 4
We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions 🎮 aligned with human norms 👩⚖️ ?
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
Reposted by Hao Zhu 朱昊
We (w/ @diyiyang.bsky.social, @zhuhao.me, & Bodhisattwa Prasad Majumder) are excited to present our #NAACL25 tutorial on Social Intelligence in the Age of LLMs!
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
⏰ May 3, 2:00pm-5:30pm Room Pecos
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
⏰ May 3, 2:00pm-5:30pm Room Pecos
May 3, 2025 at 1:58 PM
We (w/ @diyiyang.bsky.social, @zhuhao.me, & Bodhisattwa Prasad Majumder) are excited to present our #NAACL25 tutorial on Social Intelligence in the Age of LLMs!
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
⏰ May 3, 2:00pm-5:30pm Room Pecos
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
⏰ May 3, 2:00pm-5:30pm Room Pecos
Reposted by Hao Zhu 朱昊
woooooo!
Out in Child Development:
"Learning Loopholes: The Development of Intentional
Misunderstandings in Children"
paper: srcd.onlinelibrary.wiley.com/doi/10.1111/...
preprint-pdf: www.tomerullman.org/papers/kids_...
Out in Child Development:
"Learning Loopholes: The Development of Intentional
Misunderstandings in Children"
paper: srcd.onlinelibrary.wiley.com/doi/10.1111/...
preprint-pdf: www.tomerullman.org/papers/kids_...
March 13, 2025 at 12:29 PM
woooooo!
Out in Child Development:
"Learning Loopholes: The Development of Intentional
Misunderstandings in Children"
paper: srcd.onlinelibrary.wiley.com/doi/10.1111/...
preprint-pdf: www.tomerullman.org/papers/kids_...
Out in Child Development:
"Learning Loopholes: The Development of Intentional
Misunderstandings in Children"
paper: srcd.onlinelibrary.wiley.com/doi/10.1111/...
preprint-pdf: www.tomerullman.org/papers/kids_...
This works like magic!
*Please repost* @sjgreenwood.bsky.social and I just launched a new personalized feed (*please pin*) that we hope will become a "must use" for #academicsky. The feed shows posts about papers filtered by *your* follower network. It's become my default Bluesky experience bsky.app/profile/pape...
March 11, 2025 at 7:23 PM
This works like magic!
Reposted by Hao Zhu 朱昊
New personal project with my friend Michael Cho: RoboPapers, a podcast where we chat with authors of cool robotics papers and post the discussion on YouTube and spotify. First one was with Duan Jiafei, who did the very cool paper SAM2Act, and it goes up Friday.
March 5, 2025 at 10:49 PM
New personal project with my friend Michael Cho: RoboPapers, a podcast where we chat with authors of cool robotics papers and post the discussion on YouTube and spotify. First one was with Duan Jiafei, who did the very cool paper SAM2Act, and it goes up Friday.
Reposted by Hao Zhu 朱昊
🚨New Breakthrough in Tip-of-the-Tongue (TOT) Retrieval Research!
We address data limitations and offer a fresh evaluation method for these complex queries.
Curious how TREC TOT track test queries are created? Check out this thread 🧵 and our paper 📄: arxiv.org/abs/2502.17776
We address data limitations and offer a fresh evaluation method for these complex queries.
Curious how TREC TOT track test queries are created? Check out this thread 🧵 and our paper 📄: arxiv.org/abs/2502.17776
Tip of the Tongue Query Elicitation for Simulated Evaluation
Tip-of-the-tongue (TOT) search occurs when a user struggles to recall a specific identifier, such as a document title. While common, existing search systems often fail to effectively support TOT scena...
arxiv.org
March 5, 2025 at 1:32 AM
🚨New Breakthrough in Tip-of-the-Tongue (TOT) Retrieval Research!
We address data limitations and offer a fresh evaluation method for these complex queries.
Curious how TREC TOT track test queries are created? Check out this thread 🧵 and our paper 📄: arxiv.org/abs/2502.17776
We address data limitations and offer a fresh evaluation method for these complex queries.
Curious how TREC TOT track test queries are created? Check out this thread 🧵 and our paper 📄: arxiv.org/abs/2502.17776
Reposted by Hao Zhu 朱昊
EgoNormia (egonormia.org) exposes a major gap in Vision-Language Models understanding of the social world: they don't know how to behave when norms about the physical world *conflict* ⚔️ (<45% acc.)
But humans are naturally quite good at this (>90% acc.)
Check it out!
➡️ arxiv.org/abs/2502.20490
But humans are naturally quite good at this (>90% acc.)
Check it out!
➡️ arxiv.org/abs/2502.20490
March 4, 2025 at 4:44 AM
EgoNormia (egonormia.org) exposes a major gap in Vision-Language Models understanding of the social world: they don't know how to behave when norms about the physical world *conflict* ⚔️ (<45% acc.)
But humans are naturally quite good at this (>90% acc.)
Check it out!
➡️ arxiv.org/abs/2502.20490
But humans are naturally quite good at this (>90% acc.)
Check it out!
➡️ arxiv.org/abs/2502.20490
We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions 🎮 aligned with human norms 👩⚖️ ?
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
March 4, 2025 at 4:32 AM
We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions 🎮 aligned with human norms 👩⚖️ ?
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
Reposted by Hao Zhu 朱昊
Want to make a browser agent for *any* domain like banking or healthcare?
We propose methods for training LLMs with open-ended, unsupervised interaction on live websites:
✅ OSS SoTA on WebVoyager
✅ world's smallest high-performing web-agent
Try it here: nnetnav.dev
We propose methods for training LLMs with open-ended, unsupervised interaction on live websites:
✅ OSS SoTA on WebVoyager
✅ world's smallest high-performing web-agent
Try it here: nnetnav.dev
February 6, 2025 at 5:43 PM
Want to make a browser agent for *any* domain like banking or healthcare?
We propose methods for training LLMs with open-ended, unsupervised interaction on live websites:
✅ OSS SoTA on WebVoyager
✅ world's smallest high-performing web-agent
Try it here: nnetnav.dev
We propose methods for training LLMs with open-ended, unsupervised interaction on live websites:
✅ OSS SoTA on WebVoyager
✅ world's smallest high-performing web-agent
Try it here: nnetnav.dev
Ever dreamed of AI agents learning through interacting with the open world unsupervisedly? Our latest preprint introduces NNetNav-Live which collects training data through exploration on real websites and hindsight labeling, which produces a SOTA OSS agent.
February 6, 2025 at 7:22 PM
Ever dreamed of AI agents learning through interacting with the open world unsupervisedly? Our latest preprint introduces NNetNav-Live which collects training data through exploration on real websites and hindsight labeling, which produces a SOTA OSS agent.
My first bluesky post will be for my first project as a postdoc at Stanford.
Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org
Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org
Talk Arena
Interactive evaluation for audio models
talkarena.org
December 10, 2024 at 1:39 AM
My first bluesky post will be for my first project as a postdoc at Stanford.
Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org
Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org
Reposted by Hao Zhu 朱昊
With an increasing number of Large *Audio* Models 🔊, which one do users like the most?
Introducing talkarena.org — an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
🧵 (1/5)
Introducing talkarena.org — an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
🧵 (1/5)
December 10, 2024 at 12:01 AM
With an increasing number of Large *Audio* Models 🔊, which one do users like the most?
Introducing talkarena.org — an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
🧵 (1/5)
Introducing talkarena.org — an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
🧵 (1/5)