https://hychiang.info/
🔧 Supports W4A8 / W4A16 / W4AX / W8A8 for Mamba1 and Mamba2
🚀 Achieves 4× memory reduction and 3× generation speedup
⚡️ Enables 8B model inference on Orin Nano 8G at 13 tokens/sec
🔥 Outperforms W4A8KV4 Llama3-8B in both speed and quality
🔧 Supports W4A8 / W4A16 / W4AX / W8A8 for Mamba1 and Mamba2
🚀 Achieves 4× memory reduction and 3× generation speedup
⚡️ Enables 8B model inference on Orin Nano 8G at 13 tokens/sec
🔥 Outperforms W4A8KV4 Llama3-8B in both speed and quality