Kyutai
banner
kyutai-labs.bsky.social
Kyutai
@kyutai-labs.bsky.social
https://kyutai.org/ Open-Science AI Research Lab based in Paris
Available in PyTorch, MLX, on your iPhone, or in Rust for your server needs!
Project Page: kyutai.org/next/stt
OpenASR Leaderboard: huggingface.co/spaces/hf-au...
June 27, 2025 at 10:31 AM
What’s next? We strongly believe that the future of human-machine interaction lies in natural, full-duplex speech interactions, coupled with customization and extended abilities. Stay tuned for what’s to come!
May 23, 2025 at 10:14 AM
The text LLM’s response is passed to our TTS, conditioned on a 10s voice sample. We’ll provide access to the voice cloning model in a controlled way. The TTS is also streaming *in text*, reducing the latency by starting to speak even before the full text response is generated.
May 23, 2025 at 10:14 AM
Unmute’s speech-to-text is streaming, accurate, and includes a semantic VAD that predicts whether you’ve actually finished speaking or if you’re just pausing mid-sentence, meaning it’s low-latency but doesn’t interrupt you.
May 23, 2025 at 10:14 AM
“But what about Moshi?” While Moshi provides unmatched latency and naturalness, it doesn’t yet match the abilities of text models such as function-calling, stronger reasoning, and in-context learning. Unmute allows us to directly bring all of these from text to real-time voice conversations.
May 23, 2025 at 10:14 AM
🧑‍💻 Read more about Helium 1 and dactory on our blog: kyutai.org/2025/04/30/h...
🤗 Get the models on HuggingFace: huggingface.co/kyutai/heliu...
📚 Try our pretraining data pipeline on GitHub: github.com/kyutai-labs/...
May 5, 2025 at 10:39 AM
If you have audio data with speaker separated streams 🗣️🎙️🎤🤖 head over to github.com/kyutai-labs/moshi-finetune and train your own Moshi! We have already witnessed nice extensions of Moshi like J-Moshi 🇯🇵 hope this release will allow more people to create their very own voice AI!
GitHub - kyutai-labs/moshi-finetune
Contribute to kyutai-labs/moshi-finetune development by creating an account on GitHub.
github.com
April 1, 2025 at 3:47 PM
Fine-tuning Moshi only takes a couple hours and can be done on a single GPU thanks to LoRA ⚡. The codebase contains an example colab notebook that demonstrates the simplicity and the efficiency of the procedure 🎮.
🔎 github.com/kyutai-labs/...
April 1, 2025 at 3:47 PM
If you want to work on cutting-edge research, join our non-profit AI lab in Paris 🇫🇷

Thanks to Iliad Group, CMA-CGM Group, Schmidt Sciences — and the open-source community.
March 21, 2025 at 2:39 PM
🧰 Fully open-source

We’re releasing a preprint, model weights and a benchmark dataset for spoken visual question answering:

📄 Preprint arxiv.org/abs/2503.15633
🧠 Dataset huggingface.co/datasets/kyu...
🧾 Model weights huggingface.co/kyutai/moshi...
🧪 Inference code github.com/kyutai-labs/...
March 21, 2025 at 2:39 PM
🧠 How it works

MoshiVis builds on Moshi, our speech-to-speech LLM — now enhanced with vision.

206M lightweight parameters on top of a frozen Moshi give it the power to discuss images while still remaining real-time on consumer-grade hardware.
March 21, 2025 at 2:39 PM
Try it out 👉 vis.moshi.chat
Blog post 👉 kyutai.org/moshivis
March 21, 2025 at 2:39 PM
Hibiki’s smaller alternative, Hibiki-M, runs on-device in real time. Hibiki-M was obtained by distilling the full model into a smaller version with only 1.7B parameters. On an iPhone 16 Pro, Hibiki-M runs in real-time for more than a minute as shown by Tom.
February 7, 2025 at 8:22 AM
To train Hibiki, we generated bilingual data of simultaneous interpretation where a word only appears in the target when it's predictable from the source. We developed a new method based on an off-the-shelf text translation system and using a TTS system with constraints on word locations.
February 7, 2025 at 8:22 AM
Based on objective and human evaluations, Hibiki outperforms previous systems for quality, naturalness and speaker similarity and approaches human interpreters.
Here is an example of a live conference interpretation.
February 7, 2025 at 8:22 AM
We are looking forward to the feedback from the community, which will help us drive the development of Helium and make it the best multi-lingual lightweight model. Thanks @hf.co for helping us on this release!
January 13, 2025 at 5:51 PM