🔬Routing Weights (RW) are more robust to prompts than Hidden States (HS):
• RW is better than HS across 9 prompts on robustness and performance.
• 21% higher consistency: RW achieves 0.63 mean correlation vs. HS’s 0.52 with varying prompts.
🔬Routing Weights (RW) are more robust to prompts than Hidden States (HS):
• RW is better than HS across 9 prompts on robustness and performance.
• 21% higher consistency: RW achieves 0.63 mean correlation vs. HS’s 0.52 with varying prompts.
🚀No training, Large gains:
• A 22.45% improvement (on DeepSeekMoE-16B)over standalone hidden states (HS)
• MoEE outperforms supervised models with PromptEOL on LLMs.
• LLM HS often performs poorer than smaller encoder models specifically trained for embedding tasks!
🚀No training, Large gains:
• A 22.45% improvement (on DeepSeekMoE-16B)over standalone hidden states (HS)
• MoEE outperforms supervised models with PromptEOL on LLMs.
• LLM HS often performs poorer than smaller encoder models specifically trained for embedding tasks!
💡Routing weights in MoE do more than select experts—the pathway chosen by LLMs captures semantics hidden states often miss.
🤯Your MoE LLM has two semantic systems—but you were only using one!
🔍In STS tasks, when one embedding fails, the other succeeds >50% of the time👇
💡Routing weights in MoE do more than select experts—the pathway chosen by LLMs captures semantics hidden states often miss.
🤯Your MoE LLM has two semantic systems—but you were only using one!
🔍In STS tasks, when one embedding fails, the other succeeds >50% of the time👇
📎#1 daily paper at HF: huggingface.co/papers/2410....
📎code: github.com/tianyi-lab/M...
Led by Ziyue Li at UMD CS, who is also the 1st author of MosT #ICLR2025 and SMoA #NAACL2025🔍Ziyue is looking for Intern
More details👇
📎#1 daily paper at HF: huggingface.co/papers/2410....
📎code: github.com/tianyi-lab/M...
Led by Ziyue Li at UMD CS, who is also the 1st author of MosT #ICLR2025 and SMoA #NAACL2025🔍Ziyue is looking for Intern
More details👇