Lightnews — Scholar-powered news

Hao Zhu 朱昊

@zhuhao.me

550 followers 160 following 15 posts

AI researcher. Postdocing at Stanford NLP. Prev: PhD CMU LTI.

Visit https://zhuhao.me

Raising agents in the Opensocial.world

Posts Replies Media Videos

Pinned

Hao Zhu 朱昊 @zhuhao.me · Mar 4

We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions 🎮 aligned with human norms 👩‍⚖️ ?

With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!

Reposted by Hao Zhu 朱昊

Dirk Hovy

@dirkhovy.bsky.social

We (w/ @diyiyang.bsky.social, @zhuhao.me, & Bodhisattwa Prasad Majumder) are excited to present our #NAACL25 tutorial on Social Intelligence in the Age of LLMs!
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
⏰ May 3, 2:00pm-5:30pm Room Pecos

May 3, 2025 at 1:58 PM

Reposted by Hao Zhu 朱昊

Tomer Ullman

@tomerullman.bsky.social

woooooo!

Out in Child Development:

"Learning Loopholes: The Development of Intentional
Misunderstandings in Children"

paper: srcd.onlinelibrary.wiley.com/doi/10.1111/...

preprint-pdf: www.tomerullman.org/papers/kids_...

March 13, 2025 at 12:29 PM

Hao Zhu 朱昊

@zhuhao.me

This works like magic!

Nikhil Garg @nkgarg.bsky.social · Mar 10

*Please repost* @sjgreenwood.bsky.social and I just launched a new personalized feed (*please pin*) that we hope will become a "must use" for #academicsky. The feed shows posts about papers filtered by *your* follower network. It's become my default Bluesky experience bsky.app/profile/pape...

March 11, 2025 at 7:23 PM

Reposted by Hao Zhu 朱昊

Chris Paxton

@cpaxton.bsky.social

New personal project with my friend Michael Cho: RoboPapers, a podcast where we chat with authors of cool robotics papers and post the discussion on YouTube and spotify. First one was with Duan Jiafei, who did the very cool paper SAM2Act, and it goes up Friday.

March 5, 2025 at 10:49 PM

Reposted by Hao Zhu 朱昊

Danny To Eun Kim

@teknology.bsky.social

🚨New Breakthrough in Tip-of-the-Tongue (TOT) Retrieval Research!

We address data limitations and offer a fresh evaluation method for these complex queries.

Curious how TREC TOT track test queries are created? Check out this thread 🧵 and our paper 📄: arxiv.org/abs/2502.17776

Tip of the Tongue Query Elicitation for Simulated Evaluation

Tip-of-the-tongue (TOT) search occurs when a user struggles to recall a specific identifier, such as a document title. While common, existing search systems often fail to effectively support TOT scena...

arxiv.org

March 5, 2025 at 1:32 AM

Reposted by Hao Zhu 朱昊

Caleb Ziems

@calebziems.com

EgoNormia (egonormia.org) exposes a major gap in Vision-Language Models understanding of the social world: they don't know how to behave when norms about the physical world *conflict* ⚔️ (<45% acc.)

But humans are naturally quite good at this (>90% acc.)

Check it out!

➡️ arxiv.org/abs/2502.20490

March 4, 2025 at 4:44 AM

Hao Zhu 朱昊

@zhuhao.me

March 4, 2025 at 4:32 AM

Reposted by Hao Zhu 朱昊

Shikhar Murty

@shikharmurty.bsky.social

Want to make a browser agent for *any* domain like banking or healthcare?
We propose methods for training LLMs with open-ended, unsupervised interaction on live websites:
✅ OSS SoTA on WebVoyager
✅ world's smallest high-performing web-agent
Try it here: nnetnav.dev

February 6, 2025 at 5:43 PM

Hao Zhu 朱昊

@zhuhao.me

Ever dreamed of AI agents learning through interacting with the open world unsupervisedly? Our latest preprint introduces NNetNav-Live which collects training data through exploration on real websites and hindsight labeling, which produces a SOTA OSS agent.

February 6, 2025 at 7:22 PM

Hao Zhu 朱昊

@zhuhao.me

My first bluesky post will be for my first project as a postdoc at Stanford.

Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org

Talk Arena

Interactive evaluation for audio models

talkarena.org

December 10, 2024 at 1:39 AM

Reposted by Hao Zhu 朱昊

Will Held

@williamheld.com

With an increasing number of Large *Audio* Models 🔊, which one do users like the most?

Introducing talkarena.org — an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
🧵 (1/5)

Talk Arena: Interactive Evaluation of Large Audio Models

December 10, 2024 at 12:01 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news