Lightnews — Scholar-powered news

simjeg.bsky.social

@simjeg.bsky.social

🎲 Did you know Yahtzee can be solved optimally in less than 100 lines of Python and under 5min with 2 vCPU?

I built a @gradio-hf.bsky.social app so you can try it yourself: huggingface.co/spaces/simon...

Implementation is based on the excellent paper "An Optimal Strategy for Yahtzee" (Glenn, 2006)

Optimal Yahtzee - a Hugging Face Space by simonjegou

Discover amazing ML apps made by the community

huggingface.co

March 31, 2025 at 3:07 PM

simjeg.bsky.social

@simjeg.bsky.social

Fresh news from kvpress, our open source library for KV cache compression 🔥

1. We published a blog post with
@huggingface

2. We published a Space for you to try it
3. Following feedback from the research community, we added a bunch of presses and benchmarks

Links👇(1/2)

January 23, 2025 at 10:03 AM

simjeg.bsky.social

@simjeg.bsky.social

How do you find the permutation of words that minimize their perplexity as measured by an LLM ? In this year Kaggle Santa competition, I shared an approach to move to a continuous space where you can use gradient-descent using REINFORCE: www.kaggle.com/code/simjeg/...

Relax, it's Santa

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

www.kaggle.com

December 3, 2024 at 12:40 PM

simjeg.bsky.social

@simjeg.bsky.social

💡 We've just released KV cache quantization in kvpress, our open source package for KV cache compression. Check it out : github.com/NVIDIA/kvpress.

Special thanks for Arthur Zucker and Marc Sun from @huggingface.bsky.social for their support 🤗

November 26, 2024 at 1:24 PM

Reposted

simjeg.bsky.social

@simjeg.bsky.social

🚀 Excited to announce KVPress — our open-source library for efficient LLM KV cache compression!
👉 Check it out (and drop a ⭐): github.com/NVIDIA/kvpress
🔗 Full details in the thread 🧵 (1/4)

November 19, 2024 at 2:25 PM

simjeg.bsky.social

@simjeg.bsky.social

Hidden states in LLM ~ follow normal distributions. Consequently, both queries and keys also follow a normal distribution and if you replace all queries and keys by their average counterpart, this magically explains the slash pattern observed in attention matrices

November 20, 2024 at 10:06 AM

simjeg.bsky.social

@simjeg.bsky.social

Ever noticed that the attention mechanism in transformers is essentially a two-layer MLP? 🤔
A(q, K, V) = V @ softmax(K / √d @ q)
Weights: K / √d and V
nonlinearity: softmax
💡This offers fresh insights into KV cache compression research 🧵(1/3)

November 20, 2024 at 9:55 AM

simjeg.bsky.social

@simjeg.bsky.social

🚀 Excited to announce KVPress — our open-source library for efficient LLM KV cache compression!
👉 Check it out (and drop a ⭐): github.com/NVIDIA/kvpress
🔗 Full details in the thread 🧵 (1/4)

November 19, 2024 at 2:25 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news