But now for big LLMs it is not uncommon (although not ideal) to manually restart to intermediate checkpoints.
If the model you publish is strong and people reproduce your work within 3 months. Your work is very important.
But now for big LLMs it is not uncommon (although not ideal) to manually restart to intermediate checkpoints.
If the model you publish is strong and people reproduce your work within 3 months. Your work is very important.
Depending on learning rate or net size. You can have a model init at 50% acc and another 4 at 98.5%. Making strong STD and SE and the diff with CNN at 99% insignificant. Yet everybody knows it is reproducible like the sunset and the sunrise.
Depending on learning rate or net size. You can have a model init at 50% acc and another 4 at 98.5%. Making strong STD and SE and the diff with CNN at 99% insignificant. Yet everybody knows it is reproducible like the sunset and the sunrise.
Every ML researcher knows 98.5% means maxing up an MLP and 99% decent CNN. You know the margin is reproducible f you worked with this.
But reporting STD will not easily capture this margin. Not shuffle the training/test split. Not, STD over models.
Every ML researcher knows 98.5% means maxing up an MLP and 99% decent CNN. You know the margin is reproducible f you worked with this.
But reporting STD will not easily capture this margin. Not shuffle the training/test split. Not, STD over models.
In ML for Neuro, the sqrt(n) factor is obscure. n may mean: num of animals, num of models, num of CV shuffles...
The error bar will not represent well reproducibility either way.
In ML for Neuro, the sqrt(n) factor is obscure. n may mean: num of animals, num of models, num of CV shuffles...
The error bar will not represent well reproducibility either way.
Just to find out in the next batch of papers that those statistical test are easy to cheat, or having significant irrelevant results.
Just to find out in the next batch of papers that those statistical test are easy to cheat, or having significant irrelevant results.
1st: Yes, most ML researcher had strong training in applied math including stats and proba.
2nd: When the reproducibility that matters is clear. The ML field quickly agree on a simple metric. So no need to make things complicated.
1st: Yes, most ML researcher had strong training in applied math including stats and proba.
2nd: When the reproducibility that matters is clear. The ML field quickly agree on a simple metric. So no need to make things complicated.
End-to-end back-prop works wonderfully well. So good to study this case too. Cool work
End-to-end back-prop works wonderfully well. So good to study this case too. Cool work
Are all the gradients computed layer wise for weight update?
Are all the gradients computed layer wise for weight update?
I only had a quick read but congrats 🎉
The idea of modeling top-down plasticity as rare labeled 1-class-vs-all classifiers is very elegant.
The ff parameters W are learned via constrastive learning? But I did not understand what the modulation does during unlabeled sample?
I only had a quick read but congrats 🎉
The idea of modeling top-down plasticity as rare labeled 1-class-vs-all classifiers is very elegant.
The ff parameters W are learned via constrastive learning? But I did not understand what the modulation does during unlabeled sample?
arxiv.org/abs/1611.05763
Also in context learning even in spiking RNN without synaptic plasiticity in 2018
arxiv.org/abs/1803.09574
arxiv.org/abs/1611.05763
Also in context learning even in spiking RNN without synaptic plasiticity in 2018
arxiv.org/abs/1803.09574