Lightnews — Scholar-powered news

Justin Salamon

@justinsalamon.bsky.social

340 followers 130 following 15 posts

Head of Sound Design AI Research at Adobe. Machine learning and signal processing for audio & video. Musician. He/him.
www.justinsalamon.com

Posts Replies Media Videos

Justin Salamon

@justinsalamon.bsky.social

FLAM is trained jointly on instance (global) and frame-wise (local) objectives.

The secret sauce: A memory-efficient and calibrated frame-wise objective with logit adjustment to address spurious correlations, such as event dependencies and label imbalances during training

June 24, 2025 at 7:28 PM

Justin Salamon

@justinsalamon.bsky.social

Enter FLAM: Frame-Wise Language-Audio Modeling.

A model trained to produce a calibrated likelihood for *any* text prompt.

FLAM outperforms prior self-supervised models on both closed-set and open-set SED, while preserving strong retrieval and zero-shot classification accuracy

June 24, 2025 at 7:27 PM

Justin Salamon

@justinsalamon.bsky.social

Our goal is for the model to detect *any* sound via free form text queries.

"So use CLAP", some of you will say.

The problem is its output likelihoods are not calibrated for different prompts :(

That's ok ranked retrieval, but for detection it's a no go.

June 24, 2025 at 7:27 PM

Justin Salamon

@justinsalamon.bsky.social

Sound Event Detection models, ie finding sounds in audio/video recordings, are typically constrained to a predefined "closed" set of sounds, like in this (old!) model below for urban sound detection.

It has some applications, but it doesn't address general purpose sound search.

June 24, 2025 at 7:27 PM

Justin Salamon

@justinsalamon.bsky.social

I think we finally cracked it? FLAM can detect *any* sound via text prompts

arXiv (ICML'25): arxiv.org/abs/2505.053...
demos: flam-model.github.io

Led by Yusong Wu, with @tsirif.bsky.social Ke Chen, Cheng-Zhi Anna Huang, Aaron Courville, @urinieto.bsky.social @pseeth.bsky.social

June 24, 2025 at 7:26 PM

Justin Salamon

@justinsalamon.bsky.social

Generative Extend in Premiere Pro just won *five* awards at NAB 2025, including the NAB Show Product of the Year award! SODA, our group, created the audio GenAI model in charge of audio extensions in the feature. Couldn't be more proud of the team!
w/ @urinieto.bsky.social @pseeth.bsky.social

April 11, 2025 at 5:16 AM

Justin Salamon

@justinsalamon.bsky.social

We didn't expect this... our Sketch2Sound demo video has gone viral on IG with more than 5.2 million views 🤯

Amazing job @hugofloresgarcia.bsky.social @pseeth.bsky.social @urinieto.bsky.social

I should've done my hair...
www.instagram.com/reel/DEEBRhd...

February 22, 2025 at 1:28 AM

Justin Salamon

@justinsalamon.bsky.social

Here's another example of work from our group:

MultiFoley, a Video-to-Audio model that generates perfectly synced audio for video at 48 kHz and supports multimodal conditioning.

More on MultiFoley here: bsky.app/profile/czya...

December 9, 2024 at 7:04 PM

Justin Salamon

@justinsalamon.bsky.social

📢 Audio AI Job opportunity at Adobe!

The Sound Design AI Group (SODA) is looking for an exceptional research engineer to join us in building the future of AI-assisted audio and video creation.

Strong ML background, GenAI experience a plus.

Details: adobe.wd5.myworkdayjobs.com/external_exp...

December 9, 2024 at 7:00 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news