Patrick Pérez
banner
ptrkprz.bsky.social
Patrick Pérez
@ptrkprz.bsky.social
AI & CV scientist, CEO at @kyutai-labs.bsky.social
Unmute adds ears and vocal chords to your favorite text-based language model. A seamless plug-and-play augmentation with easy personalisation through voice conditioning and text instructions. We will open-source shortly.
Talk to unmute.sh 🔊, the most modular voice AI around. Empower any text LLM with voice, instantly, by wrapping it with our new speech-to-text and text-to-speech. Any personality, any voice. Interruptible, smart turn-taking. We’ll open-source everything within the next few weeks.
May 24, 2025 at 8:06 AM
After its preview version in last January, Helium 1 now takes its full expanse, with 2 billions of well used open parameters. 🇧🇬 🇭🇷 🇨🇿 🇩🇰 🇳🇱 🇬🇧 🇪🇪 🇫🇮 🇫🇷 🇩🇪 🇬🇷 🇭🇺 🇮🇪 🇮🇹 🇱🇻 🇱🇹 🇲🇹 🇵🇱 🇵🇹 🇷🇴 🇸🇰 🇸🇮 🇪🇸 🇸🇪
🚀 Thrilled to announce Helium 1, our new 2B-parameter LLM, now available alongside dactory, an open-source pipeline to reproduce its training dataset covering all 24 EU official languages. Helium sets new standards within its size class on European languages!
May 7, 2025 at 10:34 PM
One vertu of open models is to allow one to adapt them to one’s needs. This is even more impactful when finetuning is data- and compute-efficient. This is something we strive for at Kyutai. Let’s start with Moshi, our groundbreaking multi-stream spoken dialogue model.
Have you enjoyed talking to 🟢Moshi and dreamt of making your own speech to speech chat experience🧑‍🔬🤖? It's now possible with the moshi-finetune codebase! Plug your own dataset and change the voice/tone/personality of Moshi 💚🔌💿. An example after finetuning w/ only 20 hours of the DailyTalk dataset. 🧵
April 2, 2025 at 10:28 AM
Reposted by Patrick Pérez
🔥🔥🔥 CV Folks, I have some news! We're organizing a 1-day meeting in center Paris on June 6th before CVPR called CVPR@Paris (similar as NeurIPS@Paris) 🥐🍾🥖🍷

Registration is open (it's free) with priority given to authors of accepted papers: cvprinparis.github.io/CVPR2025InPa...

Big 🧵👇 with details!
March 21, 2025 at 6:43 AM
I wish it is a disgraceful video generated by an unhinged AI. Unfortunately, it is the disgraceful new reality. Shame on Trump and Vance.
www.theguardian.com/us-news/2025...
Diplomacy dies on live TV as Trump and Vance gang up to bully Ukraine leader
US president said his horrific blow-up would make ‘great television’ – the White House has never seen anything like it
www.theguardian.com
March 1, 2025 at 9:13 AM
Pushing testing dedication to the next level.
Even Kavinsky 🎧🪩 can't break Hibiki! Just like Moshi, Hibiki is robust to extreme background conditions 💥🔊.
February 11, 2025 at 11:40 PM
Simultaneous speech-to-speech translation on mobile is a world premiere. In the near future, no one will ever be lost in translation (at least for linguistic reasons).
Meet Hibiki, our simultaneous speech-to-speech translation model, currently supporting 🇫🇷➡️🇬🇧.
Hibiki produces spoken and text translations of the input speech in real-time, while preserving the speaker’s voice and optimally adapting its pace based on the semantic content of the source speech. 🧵
February 10, 2025 at 10:14 PM
New sharing step on our journey towards easy-to-use fully-open models.
Meet Helium-1 preview, our 2B multi-lingual LLM, targeting edge and mobile devices, released under a CC-BY license. Start building with it today!
huggingface.co/kyutai/heliu...
kyutai/helium-1-preview-2b · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
January 16, 2025 at 10:44 AM