Guy Davidson
guydav.bsky.social
Guy Davidson
@guydav.bsky.social
@guyd33 on the X-bird site. PhD student at NYU, broadly cognitive science x machine learning, specifically richer representations for tasks and cognitive goals. Otherwise found cooking, playing ultimate frisbee, and making hot sauces.
Thank you, Ed!!
September 17, 2025 at 9:06 PM
Tune in tomorrow for belated update #2, on post-PhD plans!
September 17, 2025 at 7:46 PM
I owe tremendous thanks to many other people, all (or, hopefully, at least most) of whom I mentioned in my acknowledgments. I’m also so grateful my dad could represent my family, and for my wife, Sarah, for, well, everything.
September 17, 2025 at 7:46 PM
Much, much larger thanks to my advisors, @brendenlake.bsky.social and @toddgureckis.bsky.social , for your guidance and mentorship over the last several years. I appreciate you so much, and this wouldn’t have looked the same without you!
September 17, 2025 at 7:46 PM
Wherever good coffee is to be found, the rest of the time. Don't hesitate to reach out!

(also happy to talk about job search in industry and what that looks and feels like these days)
July 30, 2025 at 3:47 PM
Saturday's poster session (P3-D-44) to talk about our goal inference work, in a new, physics-based environment we developed: escholarship.org/uc/item/6tb2...
Goal Inference using Reward-Producing Programs in a Novel Physics Environment
Author(s): Davidson, Guy; Todd, Graham; Colas, CŽdric; Chu, Junyi; Togelius, Julian; Tenenbaum, Joshua B.; Gureckis, Todd M; Lake, Brenden | Abstract: A child invents a game, describes its rules, and ...
escholarship.org
July 30, 2025 at 3:47 PM
Today's Minds in the Making: Design Thinking and Cognitive Science Workshop (Pacific E):

minds-making.github.io
July 30, 2025 at 3:47 PM
Finally, if this work makes you think "I'd like to work with this person," please reach out -- I'm on the job market for industry post-PhD roles (keywords: language models, interpretability, open-endedness, user intent understanding, alignment).
See more: guydavidson.me
Guy Davidson
Guy Davidson's academic website
guydavidson.me
May 23, 2025 at 5:38 PM
As with pretty much everything else I've worked on in grad school, this work would have looked different (and almost certainly worse) without the guidance of my advisors, @brendenlake.bsky.social and @toddgureckis.bsky.social . I continue to appreciate your thoughtful engagement with my work! 16/N
May 23, 2025 at 5:38 PM
This work would also have been impossible without @adinawilliams.bsky.social 's guidance, the freedom she gave me in picking a problem to study, and believing in me that I could tackle it despite it being my first foray into (mechanistic) interpretability work. 15/N
May 23, 2025 at 5:38 PM
We owe a great deal of gratitude to @ericwtodd.bsky.social d , not only for open-sourcing their code, but also for answering our numerous questions over the last few months. If you find this interesting, you should also read their paper introducing function vectors. 14/N
May 23, 2025 at 5:38 PM
See the paper for a description of the methods, the many different controls we ran, our discussion and limitations, examples of our instructions and baselines, and other odd findings (applying an FV twice can be beneficial! Some attention heads have negative causal effects!) 13/N
May 23, 2025 at 5:38 PM
Finding 5 bonus: Which post-training steps facilitate this? Using the OLMo-2 model family, we find that the SFT and DPO stages each bring a jump in performance, but the final RLVR step doesn't make a difference for the ability to extract instruction FVs. 12/N
May 23, 2025 at 5:38 PM
Finding 5: We can steer base models with instruction FVs extracted from their post-trained versions. We didn't expect this to work! It's less effective for the Llama-3.2 models that are distilled and smaller. We're also excited to dig into this and see where we can push it. 11/N
May 23, 2025 at 5:38 PM
Finding 4: The relationship between demonstrations and instructions is asymmetrical. Especially in post-trained models, the top attention heads for instructions appear peripherally useful for demonstrations, more than the opposite case (see paper for details). 10/N
May 23, 2025 at 5:38 PM
We (preliminarily) interpret this as evidence that the effect of post-training is _not_ in adapting the model to represent instructions with the mechanism used for demonstrations, but in developing a mostly complementary mechanism. We're excited to dig into this further. 9/N.
May 23, 2025 at 5:38 PM
Finding 3 bonus: examining activations in the shared attention heads, we see (a) generally increased similarity with increasing model depth, and (b) no difference in similarity between base and post-trained models (circles and squares). 8/N
May 23, 2025 at 5:38 PM