Lightnews — Scholar-powered news

Christos Plachouras

@cplachouras.bsky.social

82 followers 91 following 17 posts

researcher & musician, phd student @c4dm (audio-language models), prev @mtg @nyu, chrispla.me, he/him

Posts Replies Media Videos

Christos Plachouras

@cplachouras.bsky.social

More in the paper and code: github.com/chrispla/syn.... Big thanks to my collaborators @juj-guinot.bsky.social, George Fazekas, Elio Quinton, @emmanouilb.bsky.social, and Johan Pauwels!
I’ll be at IJCNN 2025 in Rome in a month to present this - see you there!

May 13, 2025 at 2:15 PM

Christos Plachouras

@cplachouras.bsky.social

We argue that downstream task evaluation cannot easily uncover these behaviors, and that equivariance, invariance, and disentanglement are critical components that enable a variety of real-world applications like retrieval, generation, and style transfer.

May 13, 2025 at 2:15 PM

Christos Plachouras

@cplachouras.bsky.social

Additionally, model and data scale has different effects for different models on the equivariance, invariance, and disentanglement of concepts in their representations!

May 13, 2025 at 2:15 PM

Christos Plachouras

@cplachouras.bsky.social

Importantly, different pretraining paradigms and architectures exhibit different behavior in terms of these aspects, suggesting the underlying mechanisms for their downstream performance are different.

May 13, 2025 at 2:15 PM

Christos Plachouras

@cplachouras.bsky.social

However, we show that representations differ in other important aspects, such as their equivariance (how predictably they change under input transformations), invariance (how stable they remain under them), and disentanglement (how separated different concepts are within them).

May 13, 2025 at 2:15 PM

Christos Plachouras

@cplachouras.bsky.social

There’s widespread suspicion that as model and data scale increases, model representations converge (e.g., the platonic representation hypothesis (arxiv.org/abs/2405.07987), in part supported by different architectures and pretraining paradigms leading to similar downstream performance.

May 13, 2025 at 2:15 PM

Christos Plachouras

@cplachouras.bsky.social

more at the audio coding and modeling poster sessions (AASP in poster area 2B) tomorrow (Friday) 8:30am to 10am!
paper link: tinyurl.com/4seckd2c

Learning Music Audio Representations With Limited Data

Large deep-learning models for music, including those focused on learning general-purpose music audio representations, are often assumed to require substantial training data to achieve high performanc...

tinyurl.com

April 10, 2025 at 7:56 AM

Christos Plachouras

@cplachouras.bsky.social

finally, we found that regardless of pretraining data scale, the inherent robustness of representations to relevant data perturbations is still concerning

(MusiCNN 🟠, VGG 🔵, AST 🟢, CLMR 🔴, TMAE 🟣, MFCC 🩶)

April 10, 2025 at 7:56 AM

Christos Plachouras

@cplachouras.bsky.social

we also saw that handcrafted features are actually still outperforming learned representations in some tasks

(MFCCs 🩶, Chroma 🤎)

April 10, 2025 at 7:56 AM

Christos Plachouras

@cplachouras.bsky.social

this could be somewhat attributable to larger-capacity downstream models (2-layer MLP) being able to recover more relevant information from a “bad” representation, hinting that more data could be contributing to relevant information becoming more accessible

(MusiCNN 🟠, VGG 🔵)

April 10, 2025 at 7:56 AM

Christos Plachouras

@cplachouras.bsky.social

we found that representations from some non-trained models perform surprisingly well in tagging, and that the tagging performance gap for different pretraining scales isn’t very large

(MusiCNN 🟠, VGG 🔵, AST 🟢, CLMR 🔴, TMAE 🟣, MFCC 🩶)

April 10, 2025 at 7:56 AM

Christos Plachouras

@cplachouras.bsky.social

we created subsets of MTAT ranging from 5 minutes to 8,000 minutes of audio and pretrained MusiCNN, a VGG, AST, CLMR, and a transformer-based MAE. we then extracted representations and trained SLPs and MLPs on music tagging, instrument recognition, and key detection

April 10, 2025 at 7:56 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news