Sumedh Hindupur (@sumedh-hindupur)

Huge thanks to @ekdeepl.bsky.social , @thomasfel.bsky.social
and my advisor Demba Ba for all the assistance and contributions to this project!

07.03.2025 02:53 👍 0 🔁 0 💬 0 📌 0

In Vision, SpaDE learns very interesting concepts! On ImageNette, a 10-class subset of ImageNet, for the English Springer class, it shows concepts that indicate the ears, muzzle, eye region, neck, paws, etc!
Do check out the paper: arxiv.org/abs/2503.01822 for more results!

07.03.2025 02:53 👍 0 🔁 0 💬 1 📌 0

💡 Results on real model activations: Across vision & language tasks, SpaDE finds monosemantic features better than ReLU, JumpReLU, or TopK SAEs.

It also tiles concepts beautifully.

07.03.2025 02:52 👍 0 🔁 0 💬 1 📌 0

SpaDE also captures concept heterogeneity, adaptively allocating sparsity levels to different concepts based on their intrinsic dimension, something TopK struggles with.

07.03.2025 02:51 👍 0 🔁 0 💬 1 📌 0

SpaDE captures nonlinearly separable features better than ReLU, JumpReLU, or TopK SAEs. It also shows very interesting, local receptive fields!
It tiles concept space more effectively, avoiding cross-concept correlations.

07.03.2025 02:51 👍 0 🔁 0 💬 1 📌 0

🛠️ Our Solution: SpaDE

We designed SpaDE, a novel SAE that explicitly accounts for nonlinear separability and heterogeneous dimensionality. SpaDE projects distances onto the probability simplex.
It recovers previously hidden concepts that standard SAEs completely miss!

07.03.2025 02:50 👍 0 🔁 0 💬 1 📌 0

🔬 Testing the Assumptions: We analyzed SAEs across different settings—from toy models to real-world neural activations.
Result? SAEs fail when concepts have nonlinear separability (ReLU, JumpReLU) or heterogeneous concepts (TopK).

07.03.2025 02:49 👍 0 🔁 0 💬 1 📌 0

The Big Idea: SAE encoders impose constraints on the soultion to dictionary learning, which lead to assumptions about concepts.
SAE encoders are linear transformations followed by orthogonal projections onto different sets, which dictate receptive fields and hence assumptions.

07.03.2025 02:49 👍 0 🔁 0 💬 1 📌 0

New preprint alert!
Do Sparse Autoencoders (SAEs) reveal all concepts a model relies on? Or do they impose hidden biases that shape what we can even detect?
We uncover a fundamental duality between SAE architectures and concepts they can recover.
Link: arxiv.org/abs/2503.01822

07.03.2025 02:48 👍 14 🔁 2 💬 1 📌 2

Sumedh Hindupur

Latest posts by Sumedh Hindupur @sumedh-hindupur