🎥🎧👽
I wrote a blog post about it (with memes). See 🧵
🎥🎧👽
I wrote a blog post about it (with memes). See 🧵
A(q, K, V) = V @ softmax(K / √d @ q)
Weights: K / √d and V
nonlinearity: softmax
💡This offers fresh insights into KV cache compression research 🧵(1/3)
A(q, K, V) = V @ softmax(K / √d @ q)
Weights: K / √d and V
nonlinearity: softmax
💡This offers fresh insights into KV cache compression research 🧵(1/3)
* no hidden state, deterministic execution
* execute as a Python script, parametrized by CLI args
* git-friendly: notebooks are stored as .py files
Have you tried it? How is it? github.com/marimo-team/...
* no hidden state, deterministic execution
* execute as a Python script, parametrized by CLI args
* git-friendly: notebooks are stored as .py files
Have you tried it? How is it? github.com/marimo-team/...