🔬Routing Weights (RW) are more robust to prompts than Hidden States (HS):
• RW is better than HS across 9 prompts on robustness and performance.
• 21% higher consistency: RW achieves 0.63 mean correlation vs. HS’s 0.52 with varying prompts.
🔬Routing Weights (RW) are more robust to prompts than Hidden States (HS):
• RW is better than HS across 9 prompts on robustness and performance.
• 21% higher consistency: RW achieves 0.63 mean correlation vs. HS’s 0.52 with varying prompts.
🚀No training, Large gains:
• A 22.45% improvement (on DeepSeekMoE-16B)over standalone hidden states (HS)
• MoEE outperforms supervised models with PromptEOL on LLMs.
• LLM HS often performs poorer than smaller encoder models specifically trained for embedding tasks!
🚀No training, Large gains:
• A 22.45% improvement (on DeepSeekMoE-16B)over standalone hidden states (HS)
• MoEE outperforms supervised models with PromptEOL on LLMs.
• LLM HS often performs poorer than smaller encoder models specifically trained for embedding tasks!
💡Routing weights in MoE do more than select experts—the pathway chosen by LLMs captures semantics hidden states often miss.
🤯Your MoE LLM has two semantic systems—but you were only using one!
🔍In STS tasks, when one embedding fails, the other succeeds >50% of the time👇
💡Routing weights in MoE do more than select experts—the pathway chosen by LLMs captures semantics hidden states often miss.
🤯Your MoE LLM has two semantic systems—but you were only using one!
🔍In STS tasks, when one embedding fails, the other succeeds >50% of the time👇
1⃣Routing weights (RW) in MoE provide training-free embedding complementary to widely-used hidden states (HS)
2⃣MoEE (RW + HS) beats standalone HS by +23%
1⃣Routing weights (RW) in MoE provide training-free embedding complementary to widely-used hidden states (HS)
2⃣MoEE (RW + HS) beats standalone HS by +23%