Lightnews — Scholar-powered news

@simjeg.bsky.social

230 followers 36 following 16 posts

Senior LLM Technologist @NVIDIA
Views and opinions are my own

Posts Replies Media Videos

simjeg.bsky.social

@simjeg.bsky.social

Fresh news from kvpress, our open source library for KV cache compression 🔥

1. We published a blog post with
@huggingface

2. We published a Space for you to try it
3. Following feedback from the research community, we added a bunch of presses and benchmarks

Links👇(1/2)

January 23, 2025 at 10:03 AM

simjeg.bsky.social

@simjeg.bsky.social

💡 We've just released KV cache quantization in kvpress, our open source package for KV cache compression. Check it out : github.com/NVIDIA/kvpress.

Special thanks for Arthur Zucker and Marc Sun from @huggingface.bsky.social for their support 🤗

November 26, 2024 at 1:24 PM

simjeg.bsky.social

@simjeg.bsky.social

Hidden states in LLM ~ follow normal distributions. Consequently, both queries and keys also follow a normal distribution and if you replace all queries and keys by their average counterpart, this magically explains the slash pattern observed in attention matrices

November 20, 2024 at 10:06 AM

simjeg.bsky.social

@simjeg.bsky.social

I created a DistillationPress that distills the (K,V) cache into a compressed (Kc,Vc) cache by minimizing ||A(q,K,V) - A(q,Kc,Vc)||^2. Checkout my notebook here: github.com/NVIDIA/kvpre.... More work needs to be done, it's just a first step (3/3)

November 20, 2024 at 9:55 AM

simjeg.bsky.social

@simjeg.bsky.social

🚀 Excited to announce KVPress — our open-source library for efficient LLM KV cache compression!
👉 Check it out (and drop a ⭐): github.com/NVIDIA/kvpress
🔗 Full details in the thread 🧵 (1/4)

November 19, 2024 at 2:25 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news