Lightnews — Scholar-powered news

Tianyi Zhou

@zhoutianyi.bsky.social

Assistant Professor of Computer Science at the University of Maryland and UMIACS; Research in AI, Machine Learning, NLP, Multi-modality; Previous: Research Scientist at Google; PhD from the University of Washington

Posts Replies Media Videos

Tianyi Zhou

@zhoutianyi.bsky.social

3⃣
🔬Routing Weights (RW) are more robust to prompts than Hidden States (HS):
• RW is better than HS across 9 prompts on robustness and performance.
• 21% higher consistency: RW achieves 0.63 mean correlation vs. HS’s 0.52 with varying prompts.

February 13, 2025 at 3:26 PM

Tianyi Zhou

@zhoutianyi.bsky.social

2⃣
🚀No training, Large gains:
• A 22.45% improvement (on DeepSeekMoE-16B)over standalone hidden states (HS)
• MoEE outperforms supervised models with PromptEOL on LLMs.
• LLM HS often performs poorer than smaller encoder models specifically trained for embedding tasks!

February 13, 2025 at 3:26 PM

Tianyi Zhou

@zhoutianyi.bsky.social

1⃣
💡Routing weights in MoE do more than select experts—the pathway chosen by LLMs captures semantics hidden states often miss.
🤯Your MoE LLM has two semantic systems—but you were only using one!
🔍In STS tasks, when one embedding fails, the other succeeds >50% of the time👇

February 13, 2025 at 3:26 PM

Tianyi Zhou

@zhoutianyi.bsky.social

🚨We will present "Your MoE LLM Is Secretly an Embedding Model for Free" at #ICLR2025 Oral (1.77%) this April in Singapore:
1⃣Routing weights (RW) in MoE provide training-free embedding complementary to widely-used hidden states (HS)
2⃣MoEE (RW + HS) beats standalone HS by +23%

February 13, 2025 at 3:26 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news