Lightnews — Scholar-powered news

Edson Araujo

@edsonroteia.bsky.social

21 followers 200 following 7 posts

PhD Student at Goethe University of Frankfurt
edsonroteia.github.io

Posts Replies Media Videos

Edson Araujo

@edsonroteia.bsky.social

Thanks for all my co-authors: Andrew Rouditchenko, Yuan Gong, Saurabh Bhati, Samuel Thomas, Brian Kingsbury, @leokarlin.bsky.social, Rogerio Feris, James Glass, @hildekuehne.bsky.social!
Collaboration through MIT-IBM Watson AI Lab 🚀
And thank you Adam Zewe for covering our work on MIT News!
🧵(7/7)

https://cvprconference.bsky.social‬

May 22, 2025 at 1:46 PM

Edson Araujo

@edsonroteia.bsky.social

✨ Dive deeper into CAV-MAE Sync:
🔗 Paper: arxiv.org/abs/2505.01237
🔗 Project Page: edsonroteia.github.io/cav-mae-sync/
🔗 Code: github.com/edsonroteia/...
🔗 MIT News: news.mit.edu/2025/ai-lear...
🧵(6/7)

May 22, 2025 at 1:46 PM

Edson Araujo

@edsonroteia.bsky.social

📊 We evaluated CAV-MAE Sync on AudioSet, VGG Sound, & ADE20K Sound:
➡️ Achieved strong results in zero-shot retrieval, classification, & localization.
➡️ Outperforms more complex architectures, demonstrating the power of our apporach.
🧵(5/7)

May 22, 2025 at 1:46 PM

Edson Araujo

@edsonroteia.bsky.social

➡️ Improved Spatial Localization: Learnable "register tokens" are added to reduce the semantic load on patch tokens, helping the model focus on finer details for reconstruction.
🧵(4/7)

May 22, 2025 at 1:46 PM

Edson Araujo

@edsonroteia.bsky.social

💡 Our approach:
➡️ Fine-Grained Alignment: We treat audio as a temporal sequence, aligning it with video frames, moving beyond coarse representation.
➡️ Decoupled Objectives: "Global tokens" separate the contrastive learning objective from patch-level MAE reconstruction.
🧵(3/7)

May 22, 2025 at 1:46 PM

Edson Araujo

@edsonroteia.bsky.social

Problems with the original CAV-MAE [Gong et al. 2023]:
🔹 Global audio representations fail to capture fine-grained temporal correspondences with visual frames.
🔹 Jointly learning reconstruction & cross-modal alignment can lead to suboptimal performance.
🧵 (2/7)

May 22, 2025 at 1:46 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news