For more details, check out our full paper: arxiv.org/abs/2412.16971 (8/8)
For more details, check out our full paper: arxiv.org/abs/2412.16971 (8/8)
6️⃣ Layer-wise Insights:
Both expert specialization and syntactic encoding vary across layers.
Early layers seems to contribute most to POS prediction, as shown by ablation studies. Removing early layer routing information significantly reduces MLP accuracy. (7/8)
5️⃣ Path-Based Clustering:
t-SNE visualizations of token paths reveal distinct clusters for POS categories.
Tokens with similar syntactic roles (e.g., nouns, verbs, punctuation) follow coherent routing paths, highlighting implicit syntactic awareness. (6/8)
4️⃣ Q2: Routing Paths Encode Syntax
To test whether routing paths encode syntactic information, we trained an MLP to predict POS tags from paths (the token's sequence of experts across layers). It achieved up to 89% accuracy, showing that these paths capture linguistic characteristics. (5/8)
3️⃣ Expert Specialization Results:
For example, for Phi-3.5-MoE-instruct, the specialization metric shows that some experts handle up to 72% of tokens in certain POS categories (e.g., punctuation), far exceeding the random expectation of 25%. (4/8)
2️⃣ Q1 : POS Sensitivity
We introduced a specialization metric to measure deviation from uniform token routing. If routers didn’t consider POS, tokens would route evenly, but we found clear specialization, with certain experts disproportionately handling specific POS. (3/8)
1️⃣ Motivation:
Mixture of Experts (MoE) models route tokens to "experts" sub-group of parameters.
We investigated:
• Q1: Are MoE routers sensitive to the part of speech (POS) of tokens?
• Q2: Does specialization occur in specific layers, or do routing paths encode token-level syntax? (2/8)
We are thrilled to announce that our paper, "Part-of-Speech Sensitivity of Routers in Mixture of Experts Models," has been accepted as a short paper at #COLING2025 in Abu Dhabi! 🎉 Our work addresses two interesting questions about MoE routers and linguistic behavior. 🧵 (1/8)
New here? Interested in AI/ML? Check out these great starter packs!
AI: go.bsky.app/SipA7it
RL: go.bsky.app/3WPHcHg
Women in AI: go.bsky.app/LaGDpqg
NLP: go.bsky.app/SngwGeS
AI and news: go.bsky.app/5sFqVNS
You can also search all starter packs here: blueskydirectory.com/starter-pack...