Trending
Meenakshi Khosla's Avatar

Meenakshi Khosla

@meenakshikhosla

Assistant Professor at UCSD Cognitive Science and CSE (affiliate) | Past: Postdoc @MIT, PhD @Cornell, B. Tech @IITKanpur | Interested in Biological and Artificial Intelligence

73
Followers
67
Following
6
Posts
03.04.2025
Joined
Posts Following

Latest posts by Meenakshi Khosla @meenakshikhosla

Post image

πŸ”΅πŸ”΄ Join us for the UniReps Workshop: Unifying Representations in Neural Models at
@neuripsconf.bsky.social 2025!

πŸ“ Ballroom 20D, San Diego Convention Center
Dec 6
Don’t forget to fill out the participation form. Joining in person or remotely? We welcome your questions for the panel.
πŸ”— unireps.org

01.12.2025 18:40 πŸ‘ 12 πŸ” 5 πŸ’¬ 0 πŸ“Œ 1

@andre-longon.bsky.social led/executed this project beautifullyβ€”he's applying to PhD programs this fall and would be an incredible addition to any lab!

08.10.2025 20:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

also thanks to @david-klindt.bsky.social
for an incredible collaboration.

08.10.2025 20:54 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The takeaway: superposition isn’t just an interpretability issueβ€”it warps alignment metrics too. Disentangling reveals the true representational overlap between models and between models and brains.

08.10.2025 20:54 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Across toy models, ImageNet DNNs (ResNet, ViT), and even brain data (NSD), alignment scores jump once we replace base neurons with their disentangled SAE latentsβ€”showing that superposition can mask shared structure.

08.10.2025 20:54 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We develop a theory showing how superposition arrangements deflate predictive-mapping metrics. Then we test it: disentangling with sparse autoencoders (SAEs) reveals hidden correspondences.

08.10.2025 20:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Superposition disentanglement of neural representations reveals hidden alignment The superposition hypothesis states that a single neuron within a population may participate in the representation of multiple features in order for the population to represent more features than the ...

Superposition has reshaped interpretability research. In our @unireps.bsky.social paper led by @andre-longon.bsky.social we show it also matters for measuring alignment! Two systems can represent the same features yet appear misaligned if those features are mixed differently across neurons.

08.10.2025 20:54 πŸ‘ 9 πŸ” 2 πŸ’¬ 2 πŸ“Œ 0