Interpretability vs Utility in Sparse Autoencoders for LLM Steering
90 SAEs on three LLMs gave a modest rank‑correlation (tau‑b ≈ 0.298) between interpretability and steering, and Delta Token Confidence boosted performance by ~52.5%. Read more: getnews.me/interpretability-vs-util... #sparseautoencoders #llmsteering