Ilia Breitburg
banner
breitburg.com
Ilia Breitburg
@breitburg.com
Epic! Will you release the repo? Would be really cool to play with it
November 9, 2025 at 1:52 PM
Thank you so much! Looking forward to having a speaker API :)
October 28, 2025 at 4:06 PM
"Honesty value might be working correctly"
May 25, 2025 at 11:09 PM
I used Nebius AI as an inference provider (thanks for $100 of free credits), and it's generally OK. I also used Open WebUI as a front-end, which feels like it may collapse at any moment. It has tons of features that are not documented or do not work – it's hard to know. I missed the Claude UI a lot.
May 9, 2025 at 8:12 PM
Maybe I'm doing/using something wrong, but Mistral 3.1 Small feels like Llama 2 (7B). I was not able to use it. It struggles with every language apart from English and French. I'd say it generally struggles. Responses often begin in English and end in French. Fun.
May 9, 2025 at 8:07 PM
Gemma 3 is GPT 3.5, but with a very annoying personality. And it doesn't seem to be fixable with the system prompt. The vision is the only reason why it's worth using. Phi-4 is smaller and much more capable. From which Gemini model were they distilling that? Looking forward to the 2.5 Pro distill.
May 9, 2025 at 7:59 PM
No vision is a deal breaker. Otherwise, it would be an ideal, 15x cheaper replacement for Claude. It feels like the single model that's able to compete with the closed-source ones. It's so fascinating how DeepSeek keeps democratizing intelligence. It's scary good, and just generally scary, too.
May 9, 2025 at 7:51 PM
DeepSeek V3 is surprisingly good, GPT-4.1 or Claude 3.7 Sonnet level in both alignment and intelligence. In all of my tasks, it outperforms all other models (incl. Qwen 3). No hallucinations. It has a character similar to Claude that feels intelligent. Concise, yet simple and dense responses.
May 9, 2025 at 7:44 PM
I used it for all sorts of tasks like proofreading, summarizing and asking it random questions. It performs really well. Too bad that Phi-4 mini (3.8B) is significantly less intelligent than Phi-4 (14B), seemingly in direct proportion to the size difference. Very excited for Phi-5.
May 9, 2025 at 7:37 PM
Phi-4 is insanely impressive for its size. It's tiny and practically free (can run on the base Macs), but is aligned at the GPT-4o level if not better. It's truly general-purpose, not strictly optimized for benchmarks, unlike Qwen and some others. The responses have sharpness, no fluff, I like it.
May 9, 2025 at 7:30 PM
It hallucinates a LOT. Not reliable output. It ignores instructions in 20% of the cases, and misinterprets context almost always. I tried using it for coding, and although the code almost always was runnable, it often didn't do what was required. They surely RLd the runnability, but not the quality.
May 9, 2025 at 7:22 PM
Qwen 3 — it feels like GPT-3.5 level of alignment combined with GPT-4o intelligence. The instruction following over long complex contexts is really bad. In many cases worse than Phi-4 (14B), but spends 10x more tokens. Also I didn't feel any meaningful difference between 30B-A3B and 235B-A22B.
May 9, 2025 at 7:16 PM