🥈 Phi-4 Reasoning and DeepSeek Qwen 8B both rattled verbose academic thought chains and failed at tool calling
🥇 Qwen3 30B (2507 non-reasoning) alone gave succinct correct output identical to my daily driver Gemini Flash!
🥈 Phi-4 Reasoning and DeepSeek Qwen 8B both rattled verbose academic thought chains and failed at tool calling
🥇 Qwen3 30B (2507 non-reasoning) alone gave succinct correct output identical to my daily driver Gemini Flash!
artificialanalysis.ai/models/compa...
artificialanalysis.ai/models/compa...
apxml.com/models?selec...
apxml.com/models?selec...