Explore the full spectrum of human–AI relationships with me in Ep1 of my new web series. This broad overview lays out my plans to dive deeper into the emotional, ethical, and cognitive impacts in future episodes.
#ai #artificialintelligence #chatgpt #llm #transformer #mechanisticinterpretability
#ai #artificialintelligence #chatgpt #llm #transformer #mechanisticinterpretability
New Minds Now ( HUMAN AI RELATIONSHIPS )- Episode 001 - New Channel (EXPLANATION )
YouTube video by Cody Vaillant
youtu.be
June 15, 2025 at 10:33 AM
Explore the full spectrum of human–AI relationships with me in Ep1 of my new web series. This broad overview lays out my plans to dive deeper into the emotional, ethical, and cognitive impacts in future episodes.
#ai #artificialintelligence #chatgpt #llm #transformer #mechanisticinterpretability
#ai #artificialintelligence #chatgpt #llm #transformer #mechanisticinterpretability
Statistical framing of interpretability shows high variance in EAP‑IG; small hyper‑parameter tweaks and prompt rephrasing often altered identified subnetworks. https://getnews.me/statistical-view-of-mechanistic-interpretability-shows-variance-in-eap-ig/ #eapig #mechanisticinterpretability
October 3, 2025 at 1:39 AM
Statistical framing of interpretability shows high variance in EAP‑IG; small hyper‑parameter tweaks and prompt rephrasing often altered identified subnetworks. https://getnews.me/statistical-view-of-mechanistic-interpretability-shows-variance-in-eap-ig/ #eapig #mechanisticinterpretability
So, what is #MechanisticInterpretability 🤔
Mechanistic Interpretability (MI) is the discipline of opening the black box of large language models (and other neural networks) to understand the underlying circuits, features and/or mechanisms that give rise to specific behaviours...
Mechanistic Interpretability (MI) is the discipline of opening the black box of large language models (and other neural networks) to understand the underlying circuits, features and/or mechanisms that give rise to specific behaviours...
January 29, 2025 at 9:26 PM
So, what is #MechanisticInterpretability 🤔
Mechanistic Interpretability (MI) is the discipline of opening the black box of large language models (and other neural networks) to understand the underlying circuits, features and/or mechanisms that give rise to specific behaviours...
Mechanistic Interpretability (MI) is the discipline of opening the black box of large language models (and other neural networks) to understand the underlying circuits, features and/or mechanisms that give rise to specific behaviours...
This work was made possible through a great collaboration with Jingcheng (Frank) Niu, Subhabrata Dutta, Ahmed Elshabrawy, @harishtm.bsky.social, and @igurevych.bsky.social
#Interpretability #InContextLearning #TMLR #LLMs #MechanisticInterpretability #EmergentAbilities
#Interpretability #InContextLearning #TMLR #LLMs #MechanisticInterpretability #EmergentAbilities
October 15, 2025 at 7:59 AM
This work was made possible through a great collaboration with Jingcheng (Frank) Niu, Subhabrata Dutta, Ahmed Elshabrawy, @harishtm.bsky.social, and @igurevych.bsky.social
#Interpretability #InContextLearning #TMLR #LLMs #MechanisticInterpretability #EmergentAbilities
#Interpretability #InContextLearning #TMLR #LLMs #MechanisticInterpretability #EmergentAbilities
With all renewed discussion about "Sparse AutoEncoders (#SAE)" as a way of doing #MechanisticInterpretability of #LLMs, I am resharing a part of my PhD where we proved years ago about how sparsity automatically emerges in autoencoding.
arxiv.org/abs/1708.03735
arxiv.org/abs/1708.03735
Sparse Coding and Autoencoders
In "Dictionary Learning" one tries to recover incoherent matrices $A^* \in \mathbb{R}^{n \times h}$ (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors $x^* \in ...
arxiv.org
October 3, 2025 at 2:16 PM
With all renewed discussion about "Sparse AutoEncoders (#SAE)" as a way of doing #MechanisticInterpretability of #LLMs, I am resharing a part of my PhD where we proved years ago about how sparsity automatically emerges in autoencoding.
arxiv.org/abs/1708.03735
arxiv.org/abs/1708.03735
Researchers isolate memorization from reasoning in AI neural networks https://arstechni.ca... #mechanisticinterpretability #computationalneuroscience #AllenInstituteforAI #transformermodels #gradientdescent #machinelearning #AIarchitecture #AImemorization #generalization #neuralnetworks…
November 11, 2025 at 12:00 AM
Researchers isolate memorization from reasoning in AI neural networks https://arstechni.ca... #mechanisticinterpretability #computationalneuroscience #AllenInstituteforAI #transformermodels #gradientdescent #machinelearning #AIarchitecture #AImemorization #generalization #neuralnetworks…