Meenakshi Khosla (@meenakshikhosla)

🔵🔴 Join us for the UniReps Workshop: Unifying Representations in Neural Models at
@neuripsconf.bsky.social 2025!

📍 Ballroom 20D, San Diego Convention Center
Dec 6
Don’t forget to fill out the participation form. Joining in person or remotely? We welcome your questions for the panel.
🔗 unireps.org

01.12.2025 18:40 👍 12 🔁 5 💬 0 📌 1

@andre-longon.bsky.social led/executed this project beautifully—he's applying to PhD programs this fall and would be an incredible addition to any lab!

08.10.2025 20:54 👍 1 🔁 0 💬 0 📌 0

also thanks to @david-klindt.bsky.social
for an incredible collaboration.

08.10.2025 20:54 👍 2 🔁 0 💬 1 📌 0

The takeaway: superposition isn’t just an interpretability issue—it warps alignment metrics too. Disentangling reveals the true representational overlap between models and between models and brains.

08.10.2025 20:54 👍 3 🔁 0 💬 1 📌 0

Across toy models, ImageNet DNNs (ResNet, ViT), and even brain data (NSD), alignment scores jump once we replace base neurons with their disentangled SAE latents—showing that superposition can mask shared structure.

08.10.2025 20:54 👍 3 🔁 0 💬 1 📌 0

We develop a theory showing how superposition arrangements deflate predictive-mapping metrics. Then we test it: disentangling with sparse autoencoders (SAEs) reveals hidden correspondences.

08.10.2025 20:54 👍 1 🔁 0 💬 1 📌 0

Superposition disentanglement of neural representations reveals hidden alignment The superposition hypothesis states that a single neuron within a population may participate in the representation of multiple features in order for the population to represent more features than the ...

Superposition has reshaped interpretability research. In our @unireps.bsky.social paper led by @andre-longon.bsky.social we show it also matters for measuring alignment! Two systems can represent the same features yet appear misaligned if those features are mixed differently across neurons.

08.10.2025 20:54 👍 9 🔁 2 💬 2 📌 0

Meenakshi Khosla

Latest posts by Meenakshi Khosla @meenakshikhosla