Evaluates LLMs for breakfast, preaches AI usefulness all day long at ellamind.com.
Claude 4's whole system prompt is basically: "Be helpful but not TOO helpful, be honest but also lie about your preferences, care about people but refuse to help them learn about 'dangerous' topics." It's like watching someone try to program a personality disorder! 🙄
Claude 4's whole system prompt is basically: "Be helpful but not TOO helpful, be honest but also lie about your preferences, care about people but refuse to help them learn about 'dangerous' topics." It's like watching someone try to program a personality disorder! 🙄
Conclusion: Quantised 30B models now get you ~98 % of frontier-class accuracy - at a fraction of the latency, cost, and energy. For most local RAG or agent workloads, they're not just good enough - they're the new default.
Conclusion: Quantised 30B models now get you ~98 % of frontier-class accuracy - at a fraction of the latency, cost, and energy. For most local RAG or agent workloads, they're not just good enough - they're the new default.
5️⃣The 0.6B micro-model races above 180 tok/s but tops out at 37.56% - that's why it's not even on the graph (50 % performance cut-off).
5️⃣The 0.6B micro-model races above 180 tok/s but tops out at 37.56% - that's why it's not even on the graph (50 % performance cut-off).
2️⃣But the 30B-A3B Unsloth quant delivered 82.20% while running locally at ~45 tok/s and with zero API spend.
3️⃣The same Unsloth build is ~5x faster than Qwen's Qwen3-32B, which scores 82.20% as well yet crawls at <10 tok/s.
2️⃣But the 30B-A3B Unsloth quant delivered 82.20% while running locally at ~45 tok/s and with zero API spend.
3️⃣The same Unsloth build is ~5x faster than Qwen's Qwen3-32B, which scores 82.20% as well yet crawls at <10 tok/s.
github.com/WolframRaven...
github.com/WolframRaven...