simjeg.bsky.social
@simjeg.bsky.social
Senior LLM Technologist @NVIDIA
Views and opinions are my own
Fresh news from kvpress, our open source library for KV cache compression 🔥

1. We published a blog post with
@huggingface

2. We published a Space for you to try it
3. Following feedback from the research community, we added a bunch of presses and benchmarks

Links👇(1/2)
January 23, 2025 at 10:03 AM
💡 We've just released KV cache quantization in kvpress, our open source package for KV cache compression. Check it out : github.com/NVIDIA/kvpress.

Special thanks for Arthur Zucker and Marc Sun from @huggingface.bsky.social for their support 🤗
November 26, 2024 at 1:24 PM
Hidden states in LLM ~ follow normal distributions. Consequently, both queries and keys also follow a normal distribution and if you replace all queries and keys by their average counterpart, this magically explains the slash pattern observed in attention matrices
November 20, 2024 at 10:06 AM
I created a DistillationPress that distills the (K,V) cache into a compressed (Kc,Vc) cache by minimizing ||A(q,K,V) - A(q,Kc,Vc)||^2. Checkout my notebook here: github.com/NVIDIA/kvpre.... More work needs to be done, it's just a first step (3/3)
November 20, 2024 at 9:55 AM
🚀 Excited to announce KVPress — our open-source library for efficient LLM KV cache compression!
👉 Check it out (and drop a ⭐): github.com/NVIDIA/kvpress
🔗 Full details in the thread 🧵 (1/4)
November 19, 2024 at 2:25 PM