We finetune Parler-TTS-Mini-v1 on ParaSpeechCaps and achieve significant improvements in both speech-style consistency and naturalness over our best performing baseline (that combines existing smaller-scale style datasets)!
March 8, 2025 at 4:04 AM
We finetune Parler-TTS-Mini-v1 on ParaSpeechCaps and achieve significant improvements in both speech-style consistency and naturalness over our best performing baseline (that combines existing smaller-scale style datasets)!
ParaSpeechCaps contains 282 hrs of human-labelled data and 2427 hours of automatically-labelled data. Human evaluators rate our scaled data to be on par with human-labelled data! We carefully ablate our dataset design choices.
March 8, 2025 at 4:04 AM
ParaSpeechCaps contains 282 hrs of human-labelled data and 2427 hours of automatically-labelled data. Human evaluators rate our scaled data to be on par with human-labelled data! We carefully ablate our dataset design choices.
ParaSpeechCaps is the first large-scale dataset that supports both speaker-level intrinsic tags and utterance-level situational tags. Our key contribution is a novel pipeline for scalable, automatic style annotations over such a wide variety of rich styles for the first time.
March 8, 2025 at 4:04 AM
ParaSpeechCaps is the first large-scale dataset that supports both speaker-level intrinsic tags and utterance-level situational tags. Our key contribution is a novel pipeline for scalable, automatic style annotations over such a wide variety of rich styles for the first time.
Introducing ParaSpeechCaps, our large-scale style captions dataset that enables rich, expressive control for text-to-speech models! Beyond basic pitch or speed controls, our models can generate speech that sounds "guttural", "scared", "whispered" and more; 59 style tags in total.
🧵👇
March 8, 2025 at 4:04 AM
Introducing ParaSpeechCaps, our large-scale style captions dataset that enables rich, expressive control for text-to-speech models! Beyond basic pitch or speed controls, our models can generate speech that sounds "guttural", "scared", "whispered" and more; 59 style tags in total.