Lightnews — Scholar-powered news

@flux9665.bsky.social

11 followers 18 following 0 posts

Posts Replies Media Videos

Reposted

erogol.com

@erogol.com

KyutaiTTS solved streaming text-to-speech with a state machine that generates audio word-by-word as text arrives.

220ms latency, 10-second voice cloning, 32 concurrent users on single GPU.

No more waiting for complete sentences.

Full analysis: erogol.substack.com/p/model-chec...

Model check - KyutaiTTS: Streaming Text-to-Speech with Delayed Streams Modeling

Going over the Kyutai's new TTS model and its delayed streaming model.

erogol.substack.com

August 2, 2025 at 7:46 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news