Tianyi Zhou
banner
zhoutianyi.bsky.social
Tianyi Zhou
@zhoutianyi.bsky.social
Assistant Professor of Computer Science at the University of Maryland and UMIACS; Research in AI, Machine Learning, NLP, Multi-modality; Previous: Research Scientist at Google; PhD from the University of Washington
3⃣
🔬Routing Weights (RW) are more robust to prompts than Hidden States (HS):
• RW is better than HS across 9 prompts on robustness and performance.
• 21% higher consistency: RW achieves 0.63 mean correlation vs. HS’s 0.52 with varying prompts.
February 13, 2025 at 3:26 PM
2⃣
🚀No training, Large gains:
• A 22.45% improvement (on DeepSeekMoE-16B)over standalone hidden states (HS)
• MoEE outperforms supervised models with PromptEOL on LLMs.
• LLM HS often performs poorer than smaller encoder models specifically trained for embedding tasks!
February 13, 2025 at 3:26 PM
1⃣
💡Routing weights in MoE do more than select experts—the pathway chosen by LLMs captures semantics hidden states often miss.
🤯Your MoE LLM has two semantic systems—but you were only using one!
🔍In STS tasks, when one embedding fails, the other succeeds >50% of the time👇
February 13, 2025 at 3:26 PM
📎paper: openreview.net/forum?id=eFG...
📎#1 daily paper at HF: huggingface.co/papers/2410....
📎code: github.com/tianyi-lab/M...

Led by Ziyue Li at UMD CS, who is also the 1st author of MosT #ICLR2025 and SMoA #NAACL2025🔍Ziyue is looking for Intern

More details👇
Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free
While large language models (LLMs) excel on generation tasks, their decoder-only architecture often limits their potential as embedding models if no further representation finetuning is applied....
openreview.net
February 13, 2025 at 3:26 PM