simjeg.bsky.social
@simjeg.bsky.social
Senior LLM Technologist @NVIDIA
Views and opinions are my own
🎲 Did you know Yahtzee can be solved optimally in less than 100 lines of Python and under 5min with 2 vCPU?

I built a @gradio-hf.bsky.social app so you can try it yourself: huggingface.co/spaces/simon...

Implementation is based on the excellent paper "An Optimal Strategy for Yahtzee" (Glenn, 2006)
Optimal Yahtzee - a Hugging Face Space by simonjegou
Discover amazing ML apps made by the community
huggingface.co
March 31, 2025 at 3:07 PM
Fresh news from kvpress, our open source library for KV cache compression 🔥

1. We published a blog post with
@huggingface

2. We published a Space for you to try it
3. Following feedback from the research community, we added a bunch of presses and benchmarks

Links👇(1/2)
January 23, 2025 at 10:03 AM
How do you find the permutation of words that minimize their perplexity as measured by an LLM ? In this year Kaggle Santa competition, I shared an approach to move to a continuous space where you can use gradient-descent using REINFORCE: www.kaggle.com/code/simjeg/...
Relax, it's Santa
Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources
www.kaggle.com
December 3, 2024 at 12:40 PM
💡 We've just released KV cache quantization in kvpress, our open source package for KV cache compression. Check it out : github.com/NVIDIA/kvpress.

Special thanks for Arthur Zucker and Marc Sun from @huggingface.bsky.social for their support 🤗
November 26, 2024 at 1:24 PM
Reposted
🚀 Excited to announce KVPress — our open-source library for efficient LLM KV cache compression!
👉 Check it out (and drop a ⭐): github.com/NVIDIA/kvpress
🔗 Full details in the thread 🧵 (1/4)
November 19, 2024 at 2:25 PM
Hidden states in LLM ~ follow normal distributions. Consequently, both queries and keys also follow a normal distribution and if you replace all queries and keys by their average counterpart, this magically explains the slash pattern observed in attention matrices
November 20, 2024 at 10:06 AM
Ever noticed that the attention mechanism in transformers is essentially a two-layer MLP? 🤔
A(q, K, V) = V @ softmax(K / √d @ q)
Weights: K / √d and V
nonlinearity: softmax
💡This offers fresh insights into KV cache compression research 🧵(1/3)
November 20, 2024 at 9:55 AM
🚀 Excited to announce KVPress — our open-source library for efficient LLM KV cache compression!
👉 Check it out (and drop a ⭐): github.com/NVIDIA/kvpress
🔗 Full details in the thread 🧵 (1/4)
November 19, 2024 at 2:25 PM