Lightnews — Scholar-powered news

Damien Teney

@damienteney.bsky.social

🔎Last but not least: OOD-Chameleon shines new light on existing algorithms! We can interpret the selection process as a tree and get interpretable guidelines for choosing algorithms.

July 7, 2025 at 4:51 PM

Damien Teney

@damienteney.bsky.social

✅We test OOD-Chameleon on unseen datasets & shifts: it accurately predict suitable algorithms on synthetic, vision, and language tasks. The downstream models (trained with selected algorithms) consistently have lower error than with standard selection heuristics.

July 7, 2025 at 4:51 PM

Damien Teney

@damienteney.bsky.social

🛠️To create our "dataset of datasets", we resample CelebA and CivilComments with constraints specifying diverse types/magnitudes of shifts. We also train small models with candidate algorithms to obtain their "ground truth performance" in each condition.

July 7, 2025 at 4:51 PM

Damien Teney

@damienteney.bsky.social

🦎We propose OOD-Chameleon as a proof-of-concept: an algorithm selector as a classifier (over candidate algorithms) trained on a "dataset of datasets" representing diverse shifts. The model learns which algorithms perform best in different conditions.

July 7, 2025 at 4:51 PM

Damien Teney

@damienteney.bsky.social

💡We're aiming for an "auto-ML for distribution shifts". We conjecture that datasets have properties predictive of the suitability of various algorithms to handle dist. shifts: size/complexity of the data, magnitudes/types of shifts, etc.

July 7, 2025 at 4:51 PM

Damien Teney

@damienteney.bsky.social

"Distribution shift" means many things: spurious correlations, covariate shift, label shift... with no one-size-fits-all! Many algorithms exist, 📊each for specific conditions. Could we automate the selection without trial-and-error❓

July 7, 2025 at 4:51 PM

Damien Teney

@damienteney.bsky.social

Coming up at ICML: 🤯Distribution shifts are still a huge challenge in ML. There's already a ton of algorithms to address specific conditions. So what if the challenge was just selecting the right algorithm for the right conditions?🤔🧵

July 7, 2025 at 4:51 PM

Damien Teney

@damienteney.bsky.social

What I'm criticizing is the motivated reasoning, eg seeing evil in field-specific jargon that has perfectly valid technical meaning, but isn't obvious to ppl not familiar with the CV literature. It's unfortunate because the paper has some valid points, it just completely lacks balance.

June 29, 2025 at 4:11 PM

Damien Teney

@damienteney.bsky.social

In summary, the research community has done a great job finding near-optimal choices for popular tasks. But some usecases could benefit from completely different designs! Next: how is this relevant to transformers? Check out the paper of come chat at CVPR! arxiv.org/abs/2503.10065

June 7, 2025 at 11:19 AM

Damien Teney

@damienteney.bsky.social

Lastly, for shortcut learning (undesirable focus on simple features), we also get (weak) effects on steering the learning toward simple or complex features.

June 7, 2025 at 11:19 AM

Damien Teney

@damienteney.bsky.social

We also look at grokking (delayed convergence on algorithmic tasks), and find that a proper choice of activation function entirely eliminates the issue.

June 7, 2025 at 11:19 AM

Damien Teney

@damienteney.bsky.social

If we examine the functions implemented by various architectures (heatmaps of logits on 2D slices in input space). The differences are striking!
- ReLUs: smooth functions
- Learned activations: sharp, axis-aligned decision boundaries, very similar to trees.

June 7, 2025 at 11:19 AM

Damien Teney

@damienteney.bsky.social

Next, tabular data. These datasets often contain categorical/low-cardinality variables, where a small change in input means a completely different output. The learned activations again help a lot, by helping represent sharp transitions in the learned function.

June 7, 2025 at 11:19 AM

Damien Teney

@damienteney.bsky.social

We can indeed examine the models' accuracy vs. the complexity (Fourier spectrum) of the learned functions. On regression tasks, there's a clear correlation: the ability to represent complex functions correlates with better accuracy.

June 7, 2025 at 11:19 AM

Damien Teney

@damienteney.bsky.social

Next, regression tasks, which are typically more difficult than classification for NNs. New learned activations help a lot here. They contain irregularities that help represent complex functions with sharper transitions.

June 7, 2025 at 11:19 AM

Damien Teney

@damienteney.bsky.social

First, a sanity check with MNIST, CIFAR...
Optimizing the activation function doesn't make much difference. But when initialized from a ReLU, we rediscover smooth variants similar to GeLUs! Clearly a local optimum on which researchers honed in, by trial and error over the years!

June 7, 2025 at 11:19 AM

Damien Teney

@damienteney.bsky.social

With a new bilevel method to optimize activation functions (details in the paper), we identify tasks where the simplicity bias is detrimental. We discover new activation functions enabling better generalization by helping represent complex functions.

June 7, 2025 at 11:19 AM

Damien Teney

@damienteney.bsky.social

Simplicity is an intuitive prior, and ReLUs are widely successful, but there can't be *universal* inductive biases. So, are there tasks where the simplicity bias is holding us back?

June 7, 2025 at 11:19 AM

Damien Teney

@damienteney.bsky.social

We already know that neural networks have an inbuilt "simplicity bias" that primarily depends on using ReLU-like activation functions.

June 7, 2025 at 11:19 AM

Damien Teney

@damienteney.bsky.social

Coming up this week: (oral @cvprconference.bsky.social)
Do We Always Need the Simplicity Bias?
We take another step to understand why/when neural nets generalize so well. ⬇️🧵

June 7, 2025 at 11:19 AM

Damien Teney

@damienteney.bsky.social

Seen on the other place... 😕🤷
Any advice to get more ML and less bunnies/politics in my Bluesky feed?

April 22, 2025 at 7:14 PM

Damien Teney

@damienteney.bsky.social

Here are a few more friends for you. You're welcome.

March 4, 2025 at 10:27 PM

Damien Teney

@damienteney.bsky.social

Very interesting! (still have to read the details). Looks like you missed our work last year that also showed how important initial wts are, especially with non-ReLU activations.
arxiv.org/abs/2403.02241
"Neural Redshift: Random Networks are not Random Functions"
(Fig. 14 from the appendix ⬇️)

March 1, 2025 at 2:10 AM

Damien Teney

@damienteney.bsky.social

This is as useful as telling us what the authors had for breakfast on the day of the experiments. 🤷

February 27, 2025 at 10:54 PM

Damien Teney

@damienteney.bsky.social

💡 Just learned about a useful short-hand notation for "expectation". Seems common for physicists but I can't remember coming across it before. With an example use-case below ⬇️

December 8, 2024 at 9:49 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news