Guillaume Bellec
banner
bellecguill.bsky.social
Guillaume Bellec
@bellecguill.bsky.social
AI, Neuroscience and Music
Tbf, people did request SE or STD in RL where things are less stable.

But now for big LLMs it is not uncommon (although not ideal) to manually restart to intermediate checkpoints.

If the model you publish is strong and people reproduce your work within 3 months. Your work is very important.
October 27, 2025 at 7:43 AM
Nop, SE or STD over models do not really capture this.

Depending on learning rate or net size. You can have a model init at 50% acc and another 4 at 98.5%. Making strong STD and SE and the diff with CNN at 99% insignificant. Yet everybody knows it is reproducible like the sunset and the sunrise.
October 27, 2025 at 7:34 AM
Ok. An example: MNIST

Every ML researcher knows 98.5% means maxing up an MLP and 99% decent CNN. You know the margin is reproducible f you worked with this.

But reporting STD will not easily capture this margin. Not shuffle the training/test split. Not, STD over models.
October 26, 2025 at 7:18 PM
Indeed, bad example 😅
October 26, 2025 at 5:23 PM
I agree that inferential versus descriptive is important. But how does STD versus SE address the issue?

In ML for Neuro, the sqrt(n) factor is obscure. n may mean: num of animals, num of models, num of CV shuffles...

The error bar will not represent well reproducibility either way.
October 26, 2025 at 1:32 PM
Maybe Neuroscience has been over obsessed with statistical tests. Wasting time on rebranding means, std and linear regression behind complicated tests.

Just to find out in the next batch of papers that those statistical test are easy to cheat, or having significant irrelevant results.
October 26, 2025 at 12:19 PM
Or maybe neuro-statisticians should learn from ML.

1st: Yes, most ML researcher had strong training in applied math including stats and proba.

2nd: When the reproducibility that matters is clear. The ML field quickly agree on a simple metric. So no need to make things complicated.
October 26, 2025 at 12:16 PM
Okay, sorry I had misunderstood from the context of the our last discussion, cool.

End-to-end back-prop works wonderfully well. So good to study this case too. Cool work
October 25, 2025 at 5:48 PM
Cool. Does the modulation completely replace the data augmentation via image transformation? (Does your self sup still works up to few percents of acc if you disable data augmentation completely?)

Are all the gradients computed layer wise for weight update?
October 25, 2025 at 4:30 PM
Cool work !!
I only had a quick read but congrats 🎉

The idea of modeling top-down plasticity as rare labeled 1-class-vs-all classifiers is very elegant.

The ff parameters W are learned via constrastive learning? But I did not understand what the modulation does during unlabeled sample?
October 24, 2025 at 8:37 AM
I thought for a moment that you had written that song or something. Then I googled it
October 11, 2025 at 12:20 AM
Maybe this is useful? In context learning in RNN compared to cognitive science in 2016
arxiv.org/abs/1611.05763

Also in context learning even in spiking RNN without synaptic plasiticity in 2018
arxiv.org/abs/1803.09574
Learning to reinforcement learn
In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand ...
arxiv.org
October 5, 2025 at 9:38 PM