I’m excited to stay at FAIR and work with @asli-celikyilmaz.bsky.social and friends on fun LLM questions; I’ll be working from the New York office so we’re sticking around.
I’m excited to stay at FAIR and work with @asli-celikyilmaz.bsky.social and friends on fun LLM questions; I’ll be working from the New York office so we’re sticking around.
We introduce SAMD & SAMI — a novel, concept-agnostic approach to identify and manipulate attention modules in transformers.
There are two open positions:
1. Summer research position (best for master's or graduate student); focus on computational social cognition.
2. Postdoc (currently interviewing!); focus on computational social cognition and AI safety.
sites.google.com/corp/site/sy...
We study systematic generalization in a safety setting and find LLMs struggle to consistently respond safely when we vary how we ask naive questions. More analyses in the paper!
Introducing our work SAGE-Eval, a benchmark consisting of 100+ safety facts and 10k+ scenarios to test this!
- Claude-3.7-Sonnet passes only 57% of facts evaluated
- o1 and o3-mini passed <45%! 🧵
We study systematic generalization in a safety setting and find LLMs struggle to consistently respond safely when we vary how we ask naive questions. More analyses in the paper!
"Re-evaluating Theory of Mind evaluation in large language models"
(by Hu* @jennhu.bsky.social , Sosa, and me)
link: arxiv.org/pdf/2502.21098
How do people compose existing concepts to create new goals? Can models generate and understand goals too?
nature.com/articles/s4225
How do people compose existing concepts to create new goals? Can models generate and understand goals too?
nature.com/articles/s4225
From childhood on, people can create novel, playful, and creative goals. Models have yet to capture this ability. We propose a new way to represent goals and report a model that can generate human-like goals in a playful setting... 1/N
From childhood on, people can create novel, playful, and creative goals. Models have yet to capture this ability. We propose a new way to represent goals and report a model that can generate human-like goals in a playful setting... 1/N
Pack I: go.bsky.app/KDTg6pv
Pack II: go.bsky.app/TTjTNsu
go.bsky.app/KDTg6pv
Pack I: go.bsky.app/KDTg6pv
Pack II: go.bsky.app/TTjTNsu
(Definitely true)
(Not very big; might grow up to be!)
(Meet Lila!!)
(Definitely true)
(Not very big; might grow up to be!)
(Meet Lila!!)
ACL 2025 Ling theory & Cognitive modeling track is looking for emergency reviewers. The emergency review period is between 3/18-26, and these reviewers will be excluded from the ARR cycle. If you're interested, please sign up here! docs.google.com/forms/d/1fH7...
ACL 2025 Ling theory & Cognitive modeling track is looking for emergency reviewers. The emergency review period is between 3/18-26, and these reviewers will be excluded from the ARR cycle. If you're interested, please sign up here! docs.google.com/forms/d/1fH7...
Room 260-262, starting now and until our intrinsic motivation runs out!
- How people represent (cognitive) goals, and how can machines generate such goals (presented at IMOL workshop on Saturday)
- The potential benefits of richer, more structured goal representations... (1/3)
Room 260-262, starting now and until our intrinsic motivation runs out!
- How people represent (cognitive) goals, and how can machines generate such goals (presented at IMOL workshop on Saturday)
- The potential benefits of richer, more structured goal representations... (1/3)
- How people represent (cognitive) goals, and how can machines generate such goals (presented at IMOL workshop on Saturday)
- The potential benefits of richer, more structured goal representations... (1/3)