Anuj Diwan
anujdiwan.bsky.social
Anuj Diwan
@anujdiwan.bsky.social
UT CS PhD Student working on generative speech models. Prev. Student Researcher @ Google DeepMind, FAIR (Meta AI) and Adobe Research. 2021 BTech CS IIT Bombay.
Thanks to my amazing collaborators Zhisheng Zheng, @eunsol.bsky.social and David Harwath!
Paper: arxiv.org/abs/2503.04713
Code: github.com/ajd12342/par...
Dataset: huggingface.co/datasets/ajd...
Model: huggingface.co/ajd12342/par...
Demo: paraspeechcaps.github.io
March 8, 2025 at 4:04 AM
We finetune Parler-TTS-Mini-v1 on ParaSpeechCaps and achieve significant improvements in both speech-style consistency and naturalness over our best performing baseline (that combines existing smaller-scale style datasets)!
March 8, 2025 at 4:04 AM
ParaSpeechCaps contains 282 hrs of human-labelled data and 2427 hours of automatically-labelled data. Human evaluators rate our scaled data to be on par with human-labelled data! We carefully ablate our dataset design choices.
March 8, 2025 at 4:04 AM
ParaSpeechCaps is the first large-scale dataset that supports both speaker-level intrinsic tags and utterance-level situational tags. Our key contribution is a novel pipeline for scalable, automatic style annotations over such a wide variety of rich styles for the first time.
March 8, 2025 at 4:04 AM
Thanks for this list! Would appreciate being added :)
November 22, 2024 at 4:37 PM