📜: arxiv.org/abs/2503.08533
Live Demo: huggingface.co/spaces/Siddh...
📜: arxiv.org/abs/2503.08533
Live Demo: huggingface.co/spaces/Siddh...
Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? 🗣️🔊
We benchmark their turn-taking abilities and uncover major gaps in conversational AI. 🧵👇
📜: arxiv.org/abs/2503.01174
Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? 🗣️🔊
We benchmark their turn-taking abilities and uncover major gaps in conversational AI. 🧵👇
📜: arxiv.org/abs/2503.01174