Views and opinions are my own
I built a @gradio-hf.bsky.social app so you can try it yourself: huggingface.co/spaces/simon...
Implementation is based on the excellent paper "An Optimal Strategy for Yahtzee" (Glenn, 2006)
I built a @gradio-hf.bsky.social app so you can try it yourself: huggingface.co/spaces/simon...
Implementation is based on the excellent paper "An Optimal Strategy for Yahtzee" (Glenn, 2006)
1. We published a blog post with
@huggingface
2. We published a Space for you to try it
3. Following feedback from the research community, we added a bunch of presses and benchmarks
Links👇(1/2)
1. We published a blog post with
@huggingface
2. We published a Space for you to try it
3. Following feedback from the research community, we added a bunch of presses and benchmarks
Links👇(1/2)
Special thanks for Arthur Zucker and Marc Sun from @huggingface.bsky.social for their support 🤗
Special thanks for Arthur Zucker and Marc Sun from @huggingface.bsky.social for their support 🤗
👉 Check it out (and drop a ⭐): github.com/NVIDIA/kvpress
🔗 Full details in the thread 🧵 (1/4)
👉 Check it out (and drop a ⭐): github.com/NVIDIA/kvpress
🔗 Full details in the thread 🧵 (1/4)
A(q, K, V) = V @ softmax(K / √d @ q)
Weights: K / √d and V
nonlinearity: softmax
💡This offers fresh insights into KV cache compression research 🧵(1/3)
A(q, K, V) = V @ softmax(K / √d @ q)
Weights: K / √d and V
nonlinearity: softmax
💡This offers fresh insights into KV cache compression research 🧵(1/3)
👉 Check it out (and drop a ⭐): github.com/NVIDIA/kvpress
🔗 Full details in the thread 🧵 (1/4)
👉 Check it out (and drop a ⭐): github.com/NVIDIA/kvpress
🔗 Full details in the thread 🧵 (1/4)