Zory Zhang
zoryzhang.bsky.social
Zory Zhang
@zoryzhang.bsky.social
Computational modeling of human learning: cognitive development, language acquisition, social learning, causal learning... Brown PhD student with ‪@daphnab.bsky.social‬
Pinned
👁️ 𝐂𝐚𝐧 𝐕𝐢𝐬𝐢𝐨𝐧 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 (𝐕𝐋𝐌𝐬) 𝐈𝐧𝐟𝐞𝐫 𝐇𝐮𝐦𝐚𝐧 𝐆𝐚𝐳𝐞 𝐃𝐢𝐫𝐞𝐜𝐭𝐢𝐨𝐧?
Knowing where someone looks is key to a Theory of Mind. We test 111 VLMs and 65 humans to compare their inferences.
Project page: grow-ai-like-a-child.github.io/gaze/
🧵1/11
Reposted by Zory Zhang
On the other hand, any conclusions we draw will have a very short shelf life, because the machines are in constant flux. Meanwhile, the ease with which one can do this research will sap the attention of researchers away from the harder work of understanding humans.
December 18, 2025 at 6:18 PM
Reposted by Zory Zhang
The viral "Definition of AGI" paper tells you to read fake references which do not exist!

Proof: different articles present at the specified journal/volume/page number, and their titles exist nowhere on any searchable repository.

Take this as a warning to not use LMs to generate your references!
October 18, 2025 at 12:54 AM
Reposted by Zory Zhang
#CoreCognition #LLM #multimodal #GrowAI We spent 3 years to curate 1503 classic experiments spanning 12 core concepts in human cognitive development and evaluated on 230 MLLMs with 11 different prompts for 5 times to get over 3.8 millions inference data points.

A thread (1/n) - #ICML2025
June 30, 2025 at 6:07 AM
Reposted by Zory Zhang
Beautiful to see this initiative from a group of like minded PhD students collaborating together! 🚀
New Paper Alert ‼️ Current VLMs completely fail human gaze understanding 🙀 and scaling does NO help ‼️

However, humans, since an extremely age 🧒, are extremely sensitive to other people's gaze 🙄 👀

No mentors, no labs, only pre-doc students, 111 VLMs, and we did it 😎
June 11, 2025 at 11:49 PM
Reposted by Zory Zhang
New Paper Alert ‼️ Current VLMs completely fail human gaze understanding 🙀 and scaling does NO help ‼️

However, humans, since an extremely age 🧒, are extremely sensitive to other people's gaze 🙄 👀

No mentors, no labs, only pre-doc students, 111 VLMs, and we did it 😎
June 11, 2025 at 11:21 PM
👁️ 𝐂𝐚𝐧 𝐕𝐢𝐬𝐢𝐨𝐧 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 (𝐕𝐋𝐌𝐬) 𝐈𝐧𝐟𝐞𝐫 𝐇𝐮𝐦𝐚𝐧 𝐆𝐚𝐳𝐞 𝐃𝐢𝐫𝐞𝐜𝐭𝐢𝐨𝐧?
Knowing where someone looks is key to a Theory of Mind. We test 111 VLMs and 65 humans to compare their inferences.
Project page: grow-ai-like-a-child.github.io/gaze/
🧵1/11
June 12, 2025 at 5:04 PM
Reposted by Zory Zhang
Sam is 100% correct on this. Indeed, human babies have essential cognitive priors such as permanence, continuity, and boundary of objects, 3D Euclidean understanding of space, etc.

We spent 2 years to systematically to examine and show the lack of such in MLLMs: arxiv.org/abs/2410.10855
May 24, 2025 at 5:55 AM