Alexandre Défossez
honualx.bsky.social
Alexandre Défossez
@honualx.bsky.social
Chief Exploration Officer @kyutai-labs.bsky.social in Paris.
We just released unmute.sh 🔇🔊
It is a text LLM wrapper, based on in-house streaming ASR, TTS, semantic VAD to reduce latency. ⏱️
Unlike Moshi 🟢, Unmute 🔊 is turn base, but allows customization in two clicks🖱️: voice and prompt!
Paper and open source coming soon.
May 23, 2025 at 9:51 AM
We just open sourced a fine tuning codebase for Moshi!
Have you enjoyed talking to 🟢Moshi and dreamt of making your own speech to speech chat experience🧑‍🔬🤖? It's now possible with the moshi-finetune codebase! Plug your own dataset and change the voice/tone/personality of Moshi 💚🔌💿. An example after finetuning w/ only 20 hours of the DailyTalk dataset. 🧵
April 1, 2025 at 4:47 PM
Just back from holidays, so a bit late, to announce MoshiVis, extending Moshi's multimodal capabilities to take in images 📷.
Only 200M weights were added to plug a ViT through cross attention with gating 🖼️🔀🎤
Training relies on a mix of text only and text+audio synthetic data (~20k hours) 💽
Meet MoshiVis🎙️🖼️, the first open-source real-time speech model that can talk about images!

It sees, understands, and talks about images — naturally, and out loud.

This opens up new applications, from audio description for the visual impaired to visual access to information.
March 31, 2025 at 10:06 AM
I'll start my presentation in 10 minutes, you can join in Zoom: concordia-ca.zoom.us/j/81541793947
See you there!
I'll present a dive into Moshi 🟢 and our translation model Hibiki 🇫🇷♻️🇬🇧 as part of the next @convai-rg.bsky.social reading group 👨‍🏫📗.

📅 13th of March 🕰️ 11am ET, 4pm in Paris.

I'll discuss Mimi 🗜️ and multi-stream audio modeling 🔊.
Join on Zoom, replay on YT.

⬛ ⬛ 🟧 🟧 🟨 🟨 🟩 🟩 🟩 ⬛
⬛ 🟧 🟧 🟨 🟨 🟩 🟩 🟩 ⬛ ⬛
📢 Join our Conversational AI Reading Group!
📅 Thursday, March 13 | 11 AM - 12 PM EST
🎙Speaker: Alexandre Defossez
📖 Topic: "Moshi: a speech-text foundation model for real-time dialogue"
🔗 Details: (poonehmousavi.github.io/rg)
▶️ Missed a session? Watch on YouTube: (www.youtube.com/@CONVAI_RG) 🚀
March 13, 2025 at 2:50 PM
I'll present a dive into Moshi 🟢 and our translation model Hibiki 🇫🇷♻️🇬🇧 as part of the next @convai-rg.bsky.social reading group 👨‍🏫📗.

📅 13th of March 🕰️ 11am ET, 4pm in Paris.

I'll discuss Mimi 🗜️ and multi-stream audio modeling 🔊.
Join on Zoom, replay on YT.

⬛ ⬛ 🟧 🟧 🟨 🟨 🟩 🟩 🟩 ⬛
⬛ 🟧 🟧 🟨 🟨 🟩 🟩 🟩 ⬛ ⬛
📢 Join our Conversational AI Reading Group!
📅 Thursday, March 13 | 11 AM - 12 PM EST
🎙Speaker: Alexandre Defossez
📖 Topic: "Moshi: a speech-text foundation model for real-time dialogue"
🔗 Details: (poonehmousavi.github.io/rg)
▶️ Missed a session? Watch on YouTube: (www.youtube.com/@CONVAI_RG) 🚀
Pooneh Mousavi
Homepage of Pooneh Mousavi
poonehmousavi.github.io
March 10, 2025 at 5:34 PM
Reposted by Alexandre Défossez
Even Kavinsky 🎧🪩 can't break Hibiki! Just like Moshi, Hibiki is robust to extreme background conditions 💥🔊.
February 11, 2025 at 4:11 PM
Reposted by Alexandre Défossez
Very happy to have participated in this *beautiful* documentary from Florent Muller, on the frontiers between humans and machines,
following next @yann-lecun.bsky.social and so many humbling figures of AI:
www.france.tv/documentaire...
France TV - Replay et Direct tv des chaînes France Télévisions (ex Pluzz)
Retrouvez toutes les vidéos, articles et podcasts des programmes des chaînes de France Télévisions.
www.france.tv
February 11, 2025 at 9:32 AM
Reposted by Alexandre Défossez
Our latest studies on the decoding text from brain activity, reviewed by MIT Tech Review @technologyreview.com

www.technologyreview.com/2025/02/07/1...
February 10, 2025 at 12:13 PM
Excited to meet and exchange with a number of actors from all around the world at the AI Summit 🌍
February 10, 2025 at 1:24 PM
We just released Hibiki, a 🎙️-to-🔊 simultaneous translation model 🇫🇷🇬🇧
We leverage a large synthetic corpus synthesized from the text translation model MADLAD, and our own TTS + simple lag rule.
Model is decoder only, runs at scale, even on device 📲
github.com/kyutai-labs/hibiki
Meet Hibiki, our simultaneous speech-to-speech translation model, currently supporting 🇫🇷➡️🇬🇧.
Hibiki produces spoken and text translations of the input speech in real-time, while preserving the speaker’s voice and optimally adapting its pace based on the semantic content of the source speech. 🧵
February 7, 2025 at 9:47 PM
Reposted by Alexandre Défossez
🚨Job alert (Please RT)

What: masters internship and/or PhD positions
Where: Rothschild Foundation Hospital (Paris, France)
Topic: AI and Neuroscience
Supervised by: Pierre Bourdillon and myself
Apply here: forms.gle/KKnea2QAjhAe...
Deadline: Feb 5th
forms.gle
January 15, 2025 at 8:56 AM
We just released the Helium-1 model , a 2B multi-lingual LLM which @exgrv.bsky.social and @lmazare.bsky.social have been crafting for us! Best model so far under 2.17B params on multi-lingual benchmarks 🇬🇧🇮🇹🇪🇸🇵🇹🇫🇷🇩🇪
On HF, under CC-BY licence: huggingface.co/kyutai/heliu...
January 13, 2025 at 6:10 PM