Adam Davies
adamdaviesnlp.bsky.social
Adam Davies
@adamdaviesnlp.bsky.social
PhD candidate @ UIUC | NLP, interpretability, cognitive science | http://ahdavies6.github.io
Special thanks to my fantastic collaborator and primary author Amogh Mannekote for all his great work in making this paper/project happen!
October 10, 2025 at 3:47 PM
We introduce a framework for evaluating (b), finding that popular models do NOT consistently apply their learned world models when simulating social behavior. The upshot: even when models "know" how people might behave in a given situation, they often fail to apply it in actual simulations!
October 10, 2025 at 3:45 PM
For LLM social simulations to be useful, models must both (a) learn faithful world models re: how various people might realistically behave in different circumstances; and (b) simulate behavior consistent with that world model.
October 10, 2025 at 3:45 PM
Special thanks to my fantastic collaborators @sewoong-sam-lee.bsky.social, Amogh Mannekote, Marc E. Canby, Julia Hockenmaier, @guohaoli.bsky.social, Kristy Boyer, ChengXiang Zhai, Bonnie J. Dorr, and @frapintoml.bsky.social!
October 8, 2025 at 5:09 PM
Paper 2: Do Role-Playing Agents Practice What They Preach? Belief-Behavior Alignment in LLM-Based Simulations of Human Trust (SocialSim workshop; openreview.net/forum?id=1BD...)
Do Role-Playing Agents Practice What They Preach? Belief-Behavior...
As large language models (LLMs) are increasingly studied as role-playing agents to generate synthetic data for human behavioral research, ensuring that their outputs remain coherent with their...
openreview.net
October 8, 2025 at 5:09 PM
Paper 1: Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality (main conference; openreview.net/forum?id=Xhd...)
Do Role-Playing Agents Practice What They Preach? Belief-Behavior...
As large language models (LLMs) are increasingly studied as role-playing agents to generate synthetic data for human behavioral research, ensuring that their outputs remain coherent with their...
openreview.net
October 8, 2025 at 5:09 PM
It was a real pleasure to work with my fantastic collaborators at @oxfordtvg.bsky.social on this project 🤗 already looking forward to our future work in this direction!

#OOD #generalization #LLM #steering #ICML
July 15, 2025 at 7:37 AM
*Come by our poster today to hear more!* 🙉 It’s Tue Jul 15 at 11am-1:30pm (East Exhibition Hall A-B #E-2800) 📍 You can also visit our our project page at tomalamb.github.io/focus-instru... for more details and links 🔗
Focus Instruction Tuning (ICML25)
Updating LLM instruction tuning with adaptive test-time steerability.
tomalamb.github.io
July 15, 2025 at 7:37 AM
This forces models to learn both (a) explicit relationships between latent features and task behaviors 🎯🙅↔🛠️ and (b) how to dynamically steer generation based on those relationships 🛞🤖
July 15, 2025 at 7:37 AM
The core idea is to train LLMs to generate different responses to the same task instances by conditioning on “focus”/”ignore” instructions 💡
July 15, 2025 at 7:37 AM
Great news — we developed an approach to improve instruction tuning so that the “how”/steering instructions DO work, and it even generalizes to unseen features and tasks! 🎉
July 15, 2025 at 7:37 AM
This means it’s ineffective to simply ask models to focus on the “right” (causal 🎯) features and ignore the “wrong” (spurious/biased 🙅) ones, which can lead to poor generalization and biased behaviors 😬 Wouldn’t it be cool if that DID work, though? 🤔
July 15, 2025 at 7:37 AM
Traditional instruction tuning teaches LLMs to perform open-ended tasks given text instructions 💬🤖🛠️ But standard techniques are ineffective for controlling (steering 🛞) HOW models should perform the task
July 15, 2025 at 7:37 AM
Come by our lightning talk at 3:40pm or our poster session at 4pm to hear more 🙉 (both are located in the East Ballroom A/B). Hope to see you there!
December 15, 2024 at 10:44 PM
But interpretability methods can sometimes be unreliable 🔬👎 In our second paper (openreview.net/forum?id=tmp...), we define and measure their reliability, finding that concept removal methods are unreliable and counterfactual methods have key tradeoffs between different experimental goals
Measuring the Reliability of Causal Probing Methods: Tradeoffs...
Causal probing aims to analyze foundation models by examining how intervening on their representation of various latent properties impacts their outputs. Recent works have cast doubt on the...
openreview.net
December 15, 2024 at 10:44 PM
Models fail to generalize under distribution shift if they rely on spurious features 📉🙅 In CALM (openreview.net/forum?id=x6Z...), we study whether models rely more on spurious or causal features for a range of tasks -- TLDR: they do both, leading to high performance ceilings but low floors!
Competence-Based Analysis of Language Models
Despite the recent successes of large, pretrained neural language models (LLMs), comparatively little is known about the representations of linguistic structure they learn during pretraining, which...
openreview.net
December 15, 2024 at 10:44 PM
Special thanks to my fabulous co-authors Arshia Hemmat, Tom Lamb, @dydyydyyyd.bsky.social, Phil Torr, Ashkan Khakzar, and @frapintoml.bsky.social -- loved working with you all, and can't wait for our next paper! 🚀
December 13, 2024 at 12:23 AM
I'm excited to be presenting our paper -- Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models -- today at NeurIPS (West Ballroom A-D, Poster 5202). Hope to see you there!
December 13, 2024 at 12:23 AM
Shape perception is fundamental to human vision 👁️🔷 but years of research on shape vs texture bias has relied on benchmarks that are simplistic relative to today's best VLMs 🤖🧠 It's time for a new dataset generated with methods as powerful as the models we're testing! 🦾
December 13, 2024 at 12:23 AM