Lightnews — Scholar-powered news

Eugene Kharitonov

@n0mad-0.bsky.social

11 followers 29 following 1 posts

Technical Staff at @kyutai.org. Previously Google Deep mind, Meta AI Research. CS PhD.

Posts Replies Media Videos

Reposted by Eugene Kharitonov

Kyutai

@kyutai-labs.bsky.social

Our latest open-source speech-to-text model just claimed 1st place among streaming models and 5th place overall on the OpenASR leaderboard 🥇🎙️
While all other models need the whole audio, ours delivers top-tier accuracy on streaming content.
Open, fast, and ready for production!

June 27, 2025 at 10:31 AM

Reposted by Eugene Kharitonov

Kyutai

@kyutai-labs.bsky.social

Have you enjoyed talking to 🟢Moshi and dreamt of making your own speech to speech chat experience🧑‍🔬🤖? It's now possible with the moshi-finetune codebase! Plug your own dataset and change the voice/tone/personality of Moshi 💚🔌💿. An example after finetuning w/ only 20 hours of the DailyTalk dataset. 🧵

April 1, 2025 at 3:47 PM

Reposted by Eugene Kharitonov

Alexandre Défossez

@honualx.bsky.social

Just back from holidays, so a bit late, to announce MoshiVis, extending Moshi's multimodal capabilities to take in images 📷.
Only 200M weights were added to plug a ViT through cross attention with gating 🖼️🔀🎤
Training relies on a mix of text only and text+audio synthetic data (~20k hours) 💽

Kyutai @kyutai-labs.bsky.social · Mar 21

Meet MoshiVis🎙️🖼️, the first open-source real-time speech model that can talk about images!

It sees, understands, and talks about images — naturally, and out loud.

This opens up new applications, from audio description for the visual impaired to visual access to information.

March 31, 2025 at 10:06 AM

Eugene Kharitonov

@n0mad-0.bsky.social

Hello 🌎!

March 15, 2025 at 7:13 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news