for those of us without datacenter GPUs:
• NF4 quantization - 4-bit, 7B models in ~4GB VRAM
• RWKV-7 - O(n) attention instead of O(n²)
• LoRA MoE - blend adapters at runtime
repo: gitlab.com/rune.minna/eldr.kenaz
base: @hf.co candle (rust ML framework)
for those of us without datacenter GPUs:
• NF4 quantization - 4-bit, 7B models in ~4GB VRAM
• RWKV-7 - O(n) attention instead of O(n²)
• LoRA MoE - blend adapters at runtime
repo: gitlab.com/rune.minna/eldr.kenaz
base: @hf.co candle (rust ML framework)