Explainable AI Researcher
zootime.bsky.social
Explainable AI Researcher
@zootime.bsky.social
I work with explainability AI in a german research facility
Reposted by Explainable AI Researcher
🚀 New Paper Alert! 🚀

We introduce Q-Filters, a training-free method for efficient KV Cache compression!

It is compatible with FlashAttention and can compress along generation which is particularly useful for reasoning models ⚡

TLDR: we make Streaming-LLM smarter using the geometry of attention
March 6, 2025 at 4:02 PM