Meenakshi Khosla
meenakshikhosla.bsky.social
Meenakshi Khosla
@meenakshikhosla.bsky.social
Assistant Professor at UCSD Cognitive Science and CSE (affiliate) | Past: Postdoc @MIT, PhD @Cornell, B. Tech @IITKanpur | Interested in Biological and Artificial Intelligence
Superposition has reshaped interpretability research. In our @unireps.bsky.social paper led by @andre-longon.bsky.social we show it also matters for measuring alignment! Two systems can represent the same features yet appear misaligned if those features are mixed differently across neurons.
Superposition disentanglement of neural representations reveals hidden alignment
The superposition hypothesis states that a single neuron within a population may participate in the representation of multiple features in order for the population to represent more features than the ...
arxiv.org
October 8, 2025 at 8:54 PM