Lightnews — Scholar-powered news

Anuj Diwan

@anujdiwan.bsky.social

190 followers 120 following 6 posts

UT CS PhD Student working on generative speech models. Prev. Student Researcher @ Google DeepMind, FAIR (Meta AI) and Adobe Research. 2021 BTech CS IIT Bombay.

Posts Replies Media Videos

Pinned

Anuj Diwan @anujdiwan.bsky.social · Mar 8

Introducing ParaSpeechCaps, our large-scale style captions dataset that enables rich, expressive control for text-to-speech models!
Beyond basic pitch or speed controls, our models can generate speech that sounds "guttural", "scared", "whispered" and more; 59 style tags in total.

🧵👇

Randomly sampled examples from ParaSpeechCaps. Our style prompts cover rich tags describing complex styles like rhythm, clarity, emotion, etc. in contrast to erstwhile basic style prompts that only contain gender, pitch and speed levels. We highlight rich style tags with vibrant colors and basic style tags with a gray color.

Anuj Diwan

@anujdiwan.bsky.social

March 8, 2025 at 4:04 AM

Reposted by Anuj Diwan

Grzegorz Chrupała

@grzegorz.chrupala.me

I've started putting together a starter pack with people working on Speech Technology and Speech Science: go.bsky.app/BQ7mbkA

(Self-)nominations welcome!

November 19, 2024 at 11:13 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news