Natural and Artificial Minds
Prev: PhD @ Brown, MIT FutureTech
Website: https://annatsv.github.io/
Mech interp often stays at Marr’s algorithmic level but without the computational level (what the task is, what counts as the right solution) the mechanisms we find can look arbitrary. Why does a model learn one algorithm rather than another?
🧵 (1/2)
Mech interp often stays at Marr’s algorithmic level but without the computational level (what the task is, what counts as the right solution) the mechanisms we find can look arbitrary. Why does a model learn one algorithm rather than another?
🧵 (1/2)
🤔 Neat evidence that LLMs can report on manipulated activations, with big caveats!
🧠 But leaves open: what are the “internal states” an LLM can introspect in the first place?
🤔 Neat evidence that LLMs can report on manipulated activations, with big caveats!
🧠 But leaves open: what are the “internal states” an LLM can introspect in the first place?
arxiv.org/abs/2501.15740
#AIEthics
techcrunch.com/2024/08/14/m...
#AIEthics
techcrunch.com/2024/08/14/m...