What if you could understand and control an LLM by studying its *smaller* sibling?
Our new paper introduces the Linear Representation Transferability Hypothesis. We find that the internal representations of different-sized models can be translated into one another using a simple linear(affine) map.
10.07.2025 17:26
๐ 25
๐ 10
๐ฌ 1
๐ 1
I'm at #Neurips2024 this week!
My work (arxiv.org/abs/2406.17692) w/ @gregdnlp.bsky.social & @eunsol.bsky.social exploring the connection between LLM alignment and response pluralism will be at pluralistic-alignment.github.io Saturday. Drop by to learn more!
11.12.2024 17:39
๐ 28
๐ 6
๐ฌ 0
๐ 0
We will also give a spotlight presentation of LoFiT in the #NeurIPS2024 Workshop on Foundation Model Interventions on December 15th in West Meeting Room 121, 122!
09.12.2024 22:22
๐ 0
๐ 0
๐ฌ 0
๐ 0
Interpretability can be used to improve LLM fine-tuning - check out our poster at #NeurIPS2024! Where: East Exhibit Hall A-C #3402 (Poster Session 2 East)
When: 11 Dec 4:30 - 7:30 pm PST (Vancouver time)
See you in Vancouver! Would love to chat about PEFT, interp, alignment, and more
09.12.2024 22:22
๐ 5
๐ 1
๐ฌ 1
๐ 0