ritheshkumar.bsky.social
@ritheshkumar.bsky.social
Researcher in audio and speech generative models (SampleRNN, MelGAN, DAC, …)
Research Scientist @AdobeResearch. Ex @DescriptApp, @Mila_Quebec
https://ritheshkumar.com
Reposted
The code for Simplified and Generalized Masked Diffusion for Discrete Data (Jiaxin Shi et al) has been released and a lecture by @arnauddoucet.bsky.social on this topic is also available!

🐍 Code: github.com/google-deepm...
📄 Article: arxiv.org/abs/2406.04329
📼 Video: www.youtube.com/watch?v=qj9B...
December 14, 2024 at 12:47 PM
Reposted
new paper! 🗣️Sketch2Sound💥

Sketch2Sound can create sounds from sonic imitations (i.e., a vocal imitation or a reference sound) via interpretable, time-varying control signals.

paper: arxiv.org/abs/2412.08550
web: hugofloresgarcia.art/sketch2sound
December 12, 2024 at 2:43 PM
Reposted
🎥 Introducing MultiFoley, a video-aware audio generation method with multimodal controls! 🔊
We can
⌨️Make a typewriter sound like a piano 🎹
🐱Make a cat meow like a lion roars! 🦁
⏱️Perfectly time existing SFX 💥 to a video.

arXiv: arxiv.org/abs/2411.17698
website: ificl.github.io/MultiFoley/
November 27, 2024 at 2:58 AM
Reposted
I initiated a starter pack for Audio ML. Let me know if you'd like to be added/removed.
go.bsky.app/LGmct4z
November 18, 2024 at 4:46 AM
Reposted
Made a feed that tries to index paper threads only: bsky.app/profile/psee.... To get into the feed, make a post with "arxiv.org" in the post somewhere + don't be a bot. My tiny contribution to the recent migration! Built w/ @skyfeed.app. Planning on some paper threads of my own soon...
November 24, 2024 at 4:01 AM