Lightnews — Scholar-powered news

Ziyang Chen

@czyang.bsky.social

Ph.D. Student @ UMich EECS. Multimodal learning, audio-visual learning and computer vision.
Prev research Intern @Adobe and @Meta

https://ificl.github.io/

Posts Replies Media Videos

Ziyang Chen

@czyang.bsky.social

We jointly train our model on high-quality text-audio pairs as well as videos, enabling our model to generate full-bandwidth professional audio with fine-grained creative control and synchronization.

November 27, 2024 at 2:58 AM

Ziyang Chen

@czyang.bsky.social

MultiFoley is a unified framework for video-guided audio generation leveraging text, audio, and video conditioning within a single model. As a result, we can do text-guided foley, audio-guided foley (e.g. sync your favorite sample with the video), and foley audio extension.

November 27, 2024 at 2:58 AM

Ziyang Chen

@czyang.bsky.social

🎥 Introducing MultiFoley, a video-aware audio generation method with multimodal controls! 🔊
We can
⌨️Make a typewriter sound like a piano 🎹
🐱Make a cat meow like a lion roars! 🦁
⏱️Perfectly time existing SFX 💥 to a video.

arXiv: arxiv.org/abs/2411.17698
website: ificl.github.io/MultiFoley/

November 27, 2024 at 2:58 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news