www.justinsalamon.com
The secret sauce: A memory-efficient and calibrated frame-wise objective with logit adjustment to address spurious correlations, such as event dependencies and label imbalances during training
The secret sauce: A memory-efficient and calibrated frame-wise objective with logit adjustment to address spurious correlations, such as event dependencies and label imbalances during training
A model trained to produce a calibrated likelihood for *any* text prompt.
FLAM outperforms prior self-supervised models on both closed-set and open-set SED, while preserving strong retrieval and zero-shot classification accuracy
A model trained to produce a calibrated likelihood for *any* text prompt.
FLAM outperforms prior self-supervised models on both closed-set and open-set SED, while preserving strong retrieval and zero-shot classification accuracy
"So use CLAP", some of you will say.
The problem is its output likelihoods are not calibrated for different prompts :(
That's ok ranked retrieval, but for detection it's a no go.
"So use CLAP", some of you will say.
The problem is its output likelihoods are not calibrated for different prompts :(
That's ok ranked retrieval, but for detection it's a no go.
It has some applications, but it doesn't address general purpose sound search.
It has some applications, but it doesn't address general purpose sound search.
arXiv (ICML'25): arxiv.org/abs/2505.053...
demos: flam-model.github.io
Led by Yusong Wu, with @tsirif.bsky.social Ke Chen, Cheng-Zhi Anna Huang, Aaron Courville, @urinieto.bsky.social @pseeth.bsky.social
arXiv (ICML'25): arxiv.org/abs/2505.053...
demos: flam-model.github.io
Led by Yusong Wu, with @tsirif.bsky.social Ke Chen, Cheng-Zhi Anna Huang, Aaron Courville, @urinieto.bsky.social @pseeth.bsky.social
w/ @urinieto.bsky.social @pseeth.bsky.social
w/ @urinieto.bsky.social @pseeth.bsky.social
Amazing job @hugofloresgarcia.bsky.social @pseeth.bsky.social @urinieto.bsky.social
I should've done my hair...
www.instagram.com/reel/DEEBRhd...
Amazing job @hugofloresgarcia.bsky.social @pseeth.bsky.social @urinieto.bsky.social
I should've done my hair...
www.instagram.com/reel/DEEBRhd...
MultiFoley, a Video-to-Audio model that generates perfectly synced audio for video at 48 kHz and supports multimodal conditioning.
More on MultiFoley here: bsky.app/profile/czya...
MultiFoley, a Video-to-Audio model that generates perfectly synced audio for video at 48 kHz and supports multimodal conditioning.
More on MultiFoley here: bsky.app/profile/czya...
The Sound Design AI Group (SODA) is looking for an exceptional research engineer to join us in building the future of AI-assisted audio and video creation.
Strong ML background, GenAI experience a plus.
Details: adobe.wd5.myworkdayjobs.com/external_exp...
The Sound Design AI Group (SODA) is looking for an exceptional research engineer to join us in building the future of AI-assisted audio and video creation.
Strong ML background, GenAI experience a plus.
Details: adobe.wd5.myworkdayjobs.com/external_exp...