What should you do 🤔... quantise to NF4? 🧵
What should you do 🤔... quantise to NF4? 🧵
It's called Optimal Formats for Weight Quantisation and has just hit arXiv.
1/6
It's called Optimal Formats for Weight Quantisation and has just hit arXiv.
1/6
The Byte Latent Transformer, Large Concept Models, Memory Layers & Phi-4 — all grouped under the title "Spend Your FLOPs Wisely". Here's our take (🧵)
graphcore-research.github.io/papers-of-th...
The Byte Latent Transformer, Large Concept Models, Memory Layers & Phi-4 — all grouped under the title "Spend Your FLOPs Wisely". Here's our take (🧵)
graphcore-research.github.io/papers-of-th...
douglasorr.itch.io/c-crits
douglasorr.itch.io/c-crits