Sumedh Hindupur's Avatar

Sumedh Hindupur

@sumedh-hindupur

Grad Student at Harvard SEAS Interested in ML Interpretability, Computational Neuroscience, Signal Processing

18
Followers
12
Following
9
Posts
07.03.2025
Joined
Posts Following

Latest posts by Sumedh Hindupur @sumedh-hindupur

Huge thanks to @ekdeepl.bsky.social , @thomasfel.bsky.social
and my advisor Demba Ba for all the assistance and contributions to this project!

07.03.2025 02:53 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

In Vision, SpaDE learns very interesting concepts! On ImageNette, a 10-class subset of ImageNet, for the English Springer class, it shows concepts that indicate the ears, muzzle, eye region, neck, paws, etc!
Do check out the paper: arxiv.org/abs/2503.01822 for more results!

07.03.2025 02:53 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

πŸ’‘ Results on real model activations: Across vision & language tasks, SpaDE finds monosemantic features better than ReLU, JumpReLU, or TopK SAEs.

It also tiles concepts beautifully.

07.03.2025 02:52 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

SpaDE also captures concept heterogeneity, adaptively allocating sparsity levels to different concepts based on their intrinsic dimension, something TopK struggles with.

07.03.2025 02:51 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

SpaDE captures nonlinearly separable features better than ReLU, JumpReLU, or TopK SAEs. It also shows very interesting, local receptive fields!
It tiles concept space more effectively, avoiding cross-concept correlations.

07.03.2025 02:51 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ› οΈ Our Solution: SpaDE

We designed SpaDE, a novel SAE that explicitly accounts for nonlinear separability and heterogeneous dimensionality. SpaDE projects distances onto the probability simplex.
It recovers previously hidden concepts that standard SAEs completely miss!

07.03.2025 02:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ”¬ Testing the Assumptions: We analyzed SAEs across different settingsβ€”from toy models to real-world neural activations.
Result? SAEs fail when concepts have nonlinear separability (ReLU, JumpReLU) or heterogeneous concepts (TopK).

07.03.2025 02:49 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

The Big Idea: SAE encoders impose constraints on the soultion to dictionary learning, which lead to assumptions about concepts.
SAE encoders are linear transformations followed by orthogonal projections onto different sets, which dictate receptive fields and hence assumptions.

07.03.2025 02:49 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

New preprint alert!
Do Sparse Autoencoders (SAEs) reveal all concepts a model relies on? Or do they impose hidden biases that shape what we can even detect?
We uncover a fundamental duality between SAE architectures and concepts they can recover.
Link: arxiv.org/abs/2503.01822

07.03.2025 02:48 πŸ‘ 14 πŸ” 2 πŸ’¬ 1 πŸ“Œ 2