erogol.com
@erogol.com
Doing ML

erogol.com
erogol.substack.com
github.com/erogol
My post on MiMo-Audio

open.substack.com/pub/erogol/p...

🔥 Trained on 100M+ hours and shows emergent few-shot learning:
• Voice conversion
• Emotion transfer• Speech translation
• Cross-modal reasoning

⚡ Key finding: Speech follows same scaling laws as text LLMs
Model Check - MiMo-Audio: Scaling Speech Pre-Training to 100M Hours
Going over the code and the technical report of the new Speech LM model from Xiaomi that rivals GPT4o-audio and Gemini
open.substack.com
September 22, 2025 at 5:18 PM
Machine Learns #55 is out!

Full of new models… check it out

open.substack.com/pub/erogol/p...
Machine Learns #55
Voice + reasoning releases (Ling‑flash‑2.0, VoxCPM, Kimi K2, ultraVAD) and 2 papers: long‑horizon execution & decay‑free LR schedules.
open.substack.com
September 18, 2025 at 1:01 PM
My breakdown of VibeVoice - new open-weight TTS model from Microsoft.

open.substack.com/pub/erogol/p...
Model Check - VibeVoice: Next-Token Diffusion Meets Long-Form Speech Generation
Going over the code and the technical report of the new TTS model from Microsoft Research.
open.substack.com
August 26, 2025 at 11:54 AM
ms released a tts model… nice…

You can create long form convos and podcasts with 4 distinct voice

huggingface.co/microsoft/Vi...
microsoft/VibeVoice-1.5B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
August 25, 2025 at 5:10 PM
KyutaiTTS solved streaming text-to-speech with a state machine that generates audio word-by-word as text arrives.

220ms latency, 10-second voice cloning, 32 concurrent users on single GPU.

No more waiting for complete sentences.

Full analysis: erogol.substack.com/p/model-chec...
Model check - KyutaiTTS: Streaming Text-to-Speech with Delayed Streams Modeling
Going over the Kyutai's new TTS model and its delayed streaming model.
erogol.substack.com
August 2, 2025 at 7:46 PM
This is such a great idea
We’re excited to introduce Text-to-LoRA: a Hypernetwork that generates task-specific LLM adapters (LoRAs) based on a text description of the task. Catch our presentation at #ICML2025!

Paper: arxiv.org/abs/2506.06105
Code: github.com/SakanaAI/Tex...
June 12, 2025 at 1:59 PM
claude is the best coding model

gemini cause frequent syntax errors

openai does not even understand the task at hand
June 10, 2025 at 1:38 PM
lately spending sometime with Diffusion LMs and working on NanoGPT style LlaDA model

so far I've not achieved comparable results to AR models but its a good start

github.com/erogol/BlaGP...
BlaGPT/bla_gpt/llada.py at main · erogol/BlaGPT
Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. - erogol/BlaGPT
github.com
June 1, 2025 at 2:12 PM
Reposted
This work was done in collaboration with Jeff Clune’s lab at UBC, and led by his PhD students Jenny Zhang and Shengran Hu, together with Cong Lu and Robert Lange.

Paper: arxiv.org/abs/2505.22954
Code: github.com/jennyzzt/dgm
May 30, 2025 at 2:33 AM
⚡ Machine Learns issue 48 is out

🚀 dKV-Cache accelerates diffusion models up to 10x faster
🔐 OpenAI's authentication play (think OAuth for AI)
🎯 PaTH Attention beats RoPE on long-context tasks
🤖 Humanoid Robot fights became real

open.substack.com/pub/erogol/p...
Machine Learns #48
OpenAI's 'Sign in with ChatGPT', Meta's AGI ambitions, new models like Gemma 3 & MAGI-1, research breakthroughs in KV caching for diffusion & PaTH Attention, and fresh open-source releases.
open.substack.com
May 28, 2025 at 12:25 PM
Following the bread crumbs, implemented PLE from Gemma3n.

It gave a significant performance boost and resulted in a new best model with almost no compute overhead.

github.com/erogol/BlaGPT
GitHub - erogol/BlaGPT: Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration.
Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. - erogol/BlaGPT
github.com
May 27, 2025 at 9:36 AM
My paper notes on 2 new papers

- Model Merging in Pre-training of Large Language Models,
- Do Not Let Low-Probability Tokens Over-Dominate in RL,

open.substack.com/pub/erogol/p...
Paper check: Merging LLMs at Pre-training, Considering Token Probabilities at RL
🔬Two papers in scope: "Model Merging in Pre-training for LLMs" and "Do Not Let Low-Probability Tokens Over-Dominate in RL"
open.substack.com
May 21, 2025 at 12:10 PM
muon really works. got best results in BlaGPT

```
torchrun --standalone --nproc_per_node=8 train.py --run_name best_model --model_name best
```

github.com/erogol/BlaGPT
GitHub - erogol/BlaGPT: Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration.
Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. - erogol/BlaGPT
github.com
May 8, 2025 at 1:14 PM
🧵 Here is a small thread with my notes about some of the recent Transformer papers.

- Softpick: an alternative to softmax in Attention
- Canon Layers: mixing states with conv1d
- Parallel Transformer blocks
May 6, 2025 at 12:11 PM
Machine learns #45 - no fluff AI newsletter - is out!

I normally share bi-weekly but last week was full enough so here we go

open.substack.com/pub/erogol/p...
Machine Learns #45
OpenAI's social network & GPT-4.1, China launches $8.2B AI fund, NVIDIA's US manufacturing push, new GLM-4 & MineWorld models, C3PO expert pathways optimization, GigaTok's 3B visual tokenizer...
open.substack.com
April 16, 2025 at 1:54 PM
Updated my LLM usage and cancelled ChatGPT sub for now

Coding - Claude, Gemini 2.5
Reading papers - Claude
Research - Gemini 2.5
Daily - Gemini 2.5
Search - Gemini 2.5
Here is my use of LLMs

Coding - Claude (best by far), QwenChat
Reading papers - Claude
Research - ChatGPT (best UI,UX), Gemini (better results)
Daily - ChatGPT
Search - ChatGPT

I'd love to try searching with Claude, but not there yet.

Any suggestions for change?
April 11, 2025 at 9:06 PM
Machine Learns #44 is out !!

click for no fluff AI newsletter

erogol.substack.com/p/machine-le...
Machine Learns #44
Praxis Sam Altman's tech utopia, Amazon launches Nova Sonic voice AI, Midjourney returns with V7, Llama 4 models debut amid controversy, new brain-to-voice model, NoProp learning ...
erogol.substack.com
April 9, 2025 at 2:18 PM
Next big thing is Brain-LLMs.

Imagine an LLM compressing all world knowledge attached to your brain and ready to serve your thoughts and questions.

You also update it over internet and pay for sub. I don't want to think about the ad business :)
April 1, 2025 at 1:26 PM
“If these results generalize to real-world software tasks, extrapolation of this trend predicts that within 5 years, AI systems will be capable of automating many software tasks that currently take humans a month.”

arxiv.org/abs/2503.14499
Measuring AI Ability to Complete Long Tasks
Despite rapid progress on AI benchmarks, the real-world meaning of benchmark performance remains unclear. To quantify the capabilities of AI systems in terms of human capabilities, we propose a new me...
arxiv.org
March 21, 2025 at 10:05 AM
It’s crazy that Gemma3 held up for only about three days
March 18, 2025 at 2:10 PM
Here is my no fuzz newsletter

open.substack.com/pub/erogol/p...
March 12, 2025 at 1:49 PM
Here is my use of LLMs

Coding - Claude (best by far), QwenChat
Reading papers - Claude
Research - ChatGPT (best UI,UX), Gemini (better results)
Daily - ChatGPT
Search - ChatGPT

I'd love to try searching with Claude, but not there yet.

Any suggestions for change?
March 8, 2025 at 1:55 PM
I think diffusion-based LLMs (LLdMs) are better suited as next-generation LLMs

- multiple outputs per iter: faster output generation
- no causal masking: bidirectional attention
- multiple diff steps: reasoning at inference time and revising poor outputs
March 3, 2025 at 9:57 AM