Anuj Diwan
anujdiwan.bsky.social
Anuj Diwan
@anujdiwan.bsky.social
UT CS PhD Student working on generative speech models. Prev. Student Researcher @ Google DeepMind, FAIR (Meta AI) and Adobe Research. 2021 BTech CS IIT Bombay.
Pinned
Introducing ParaSpeechCaps, our large-scale style captions dataset that enables rich, expressive control for text-to-speech models!
Beyond basic pitch or speed controls, our models can generate speech that sounds "guttural", "scared", "whispered" and more; 59 style tags in total.

🧵👇
Introducing ParaSpeechCaps, our large-scale style captions dataset that enables rich, expressive control for text-to-speech models!
Beyond basic pitch or speed controls, our models can generate speech that sounds "guttural", "scared", "whispered" and more; 59 style tags in total.

🧵👇
March 8, 2025 at 4:04 AM
Reposted by Anuj Diwan
I've started putting together a starter pack with people working on Speech Technology and Speech Science: go.bsky.app/BQ7mbkA

(Self-)nominations welcome!
November 19, 2024 at 11:13 AM