Lightnews — Scholar-powered news

Kathy Garcia

@gkathy.bsky.social

Computational Cognitive Science PhD at Johns Hopkins with Leyla Isik

| BS @Stanford|

| 🔗 https://garciakathy.github.io/ |

Posts Replies Media Videos

Kathy Garcia

@gkathy.bsky.social

In follow-up experiments we show this model generalizes better to novel social tasks, and avoids catastrophic forgetting by preserving baseline on action recognition tasks.

October 3, 2025 at 1:48 PM

Kathy Garcia

@gkathy.bsky.social

After fine-tuning, the video model explains both captures more shared variance with language models AND captures more unique variance in human judgments, indicating it learned both language-like semantics and additional visual social nuances.

October 3, 2025 at 1:48 PM

Kathy Garcia

@gkathy.bsky.social

Despite the task being purely visual, caption embeddings from a language model predict human similarity better than any pretrained video model (e.g., mpnet-base-v2 > TimeSformer-base).

October 3, 2025 at 1:48 PM

Kathy Garcia

@gkathy.bsky.social

🚨New preprint w/ @lisik.bsky.social!
Aligning Video Models with Human Social Judgments via Behavior-Guided Fine-Tuning

We introduce a ~49k triplet social video dataset, uncover a modality gap (language > video), and close via novel behavior-guided fine-tuning.
🔗 arxiv.org/abs/2510.01502

October 3, 2025 at 1:48 PM

Kathy Garcia

@gkathy.bsky.social

📹 While most model features (like architecture or training objective) did not affect performance, we saw a big advantage for video versus image models along the lateral stream. But no model tested could predict anterior lateral stream responses well. [5/6]

April 23, 2025 at 6:08 PM

Kathy Garcia

@gkathy.bsky.social

🔍 Unlike visual scene features and ventral stream responses, vision models struggled to match human action and social interaction ratings, and did a poor job of predicting brain responses along the recently proposed lateral stream, specialized for social perception. [4/6]

April 23, 2025 at 6:08 PM

Kathy Garcia

@gkathy.bsky.social

🧠 We benchmarked 350+ image, video, and language models against human behavioral and neural responses to dynamic, social videos. [3/6]

April 23, 2025 at 6:08 PM

Kathy Garcia

@gkathy.bsky.social

🎥 Real-world vision is dynamic, involving complex social interactions. Current AI models provide a good match to humans in static scene vision, but how do they fare with dynamic, social stimuli? 🤔 We set out to explore this! [2/6]

April 23, 2025 at 6:08 PM

Kathy Garcia

@gkathy.bsky.social

📢 Excited to announce our paper at #ICLR2025: “Modeling dynamic social vision highlights gaps between deep learning and humans”! w/ @emaliemcmahon.bsky.social, Colin Conwell, Mick Bonner, @lisik.bsky.social
‬

‪📆 Thur, Apr, 24: 3:00-5:30 - Poster session 2 (#64) ‬
‪📄 bit.ly/4jISKES%E2%8... [1/6]

April 23, 2025 at 6:08 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news