Lightnews — Scholar-powered news

rbalestr.bsky.social

@rbalestr.bsky.social

Our solution is to train a SSL denoiser only to create a data curriculum for the SSL method you are interested in. By first observing denoised samples and gradually going back to the original samples, the final SSL model performs better than the baseline!

May 20, 2025 at 2:38 PM

rbalestr.bsky.social

@rbalestr.bsky.social

With high levels of noise, it is standard to have a denoiser as part of the train/test preprocessing pipeline... but this has drawbacks e.g. adding a bias to your pipeline, cross-validation, sensitivity to distribution shifts... AI/SSL should strive for denoiser-free pipelines!

May 20, 2025 at 2:38 PM

rbalestr.bsky.social

@rbalestr.bsky.social

The spline connection offers closed-form solution for many questions we have been wondering around SAEs--and provides clear actionable solutions such as our PAM-SGD training algo. PAM-SGD is EM-like, relying on the partition and region assignment, outperforming typical Adam/SGD

May 20, 2025 at 2:08 PM

rbalestr.bsky.social

@rbalestr.bsky.social

The findings stem from expressing SAEs as splines (arxiv.org/abs/2408.04809) and doing a deep dive into their partition, constraints, and underlying geometry! We not only characterize their input space partition and geometry, but tie SAE to common methods such as k-means and PCA

May 20, 2025 at 2:08 PM

rbalestr.bsky.social

@rbalestr.bsky.social

Want better training and geometric insights for Sparse AutoEncoders (SAEs)? Search no more... We leverage spline theory to provide a new "EM-like" training algo (PAM-SGD) and to delve into SAE geometry with connections to PCA, k-means, and more...

arxiv.org/abs/2505.11836

May 20, 2025 at 2:08 PM

rbalestr.bsky.social

@rbalestr.bsky.social

That bias towards capturing details manifests itself in terms of different attention behavior within ViTs. From those findings, we propose a new token aggregator that can counter such attention bias without having to finetune the backbone -> gains in linear probe performance!

December 5, 2024 at 6:47 PM

rbalestr.bsky.social

@rbalestr.bsky.social

Learning by reconstruction captures uninformative details in your data. This “attention to details” biases the ViT’s attention. Our solution: a new token aggregator->improves (significantly) MAE linear probe perf. and (slightly) JEPAs like I-JEPA
arxiv.org/abs/2412.03215

December 5, 2024 at 6:47 PM

rbalestr.bsky.social

@rbalestr.bsky.social

We propose an approach that combines segmentation and association of geographic entities in historical maps using video instance segmentation (VIS). Combined with a novel method for generating synthetic videos from unlabeled historical maps, we produce SSL models with high acc.

December 2, 2024 at 2:22 PM

rbalestr.bsky.social

@rbalestr.bsky.social

Understanding the evolution of historical maps is key to track the development of civilizations (urbanization, environmental changes, ...). We show how to use Self Supervised Learning to do that without supervision!
arxiv.org/abs/2411.17425
(SSL workshop NeurIPS24)

December 2, 2024 at 2:22 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news