Tianyi Zhou
banner
zhoutianyi.bsky.social
Tianyi Zhou
@zhoutianyi.bsky.social
Assistant Professor of Computer Science at the University of Maryland and UMIACS; Research in AI, Machine Learning, NLP, Multi-modality; Previous: Research Scientist at Google; PhD from the University of Washington
3⃣
🔬Routing Weights (RW) are more robust to prompts than Hidden States (HS):
• RW is better than HS across 9 prompts on robustness and performance.
• 21% higher consistency: RW achieves 0.63 mean correlation vs. HS’s 0.52 with varying prompts.
February 13, 2025 at 3:26 PM
2⃣
🚀No training, Large gains:
• A 22.45% improvement (on DeepSeekMoE-16B)over standalone hidden states (HS)
• MoEE outperforms supervised models with PromptEOL on LLMs.
• LLM HS often performs poorer than smaller encoder models specifically trained for embedding tasks!
February 13, 2025 at 3:26 PM
1⃣
💡Routing weights in MoE do more than select experts—the pathway chosen by LLMs captures semantics hidden states often miss.
🤯Your MoE LLM has two semantic systems—but you were only using one!
🔍In STS tasks, when one embedding fails, the other succeeds >50% of the time👇
February 13, 2025 at 3:26 PM
🚨We will present "Your MoE LLM Is Secretly an Embedding Model for Free" at #ICLR2025 Oral (1.77%) this April in Singapore:
1⃣Routing weights (RW) in MoE provide training-free embedding complementary to widely-used hidden states (HS)
2⃣MoEE (RW + HS) beats standalone HS by +23%
February 13, 2025 at 3:26 PM