With a quick numerical test, I recover confirmation bias in both the halves when fitting to Bayes-optimal behavior. Therefore, I do not think that the schematic is accurate.
I can share my code.
With a quick numerical test, I recover confirmation bias in both the halves when fitting to Bayes-optimal behavior. Therefore, I do not think that the schematic is accurate.
I can share my code.
Are you saying that had the learning rates been decaying with time, we would not have observed this effect?
Are you saying that had the learning rates been decaying with time, we would not have observed this effect?
If so, it is unclear to me what dynamical features it introduces. I think it would require a detailed analysis.
If so, it is unclear to me what dynamical features it introduces. I think it would require a detailed analysis.
Also what's normative depends on the assumed model class (eg change-points vs random-walk). In change-point models, rates spike at a changes and decay otherwise.
[1] papers.nips.cc/paper_files/...
Also what's normative depends on the assumed model class (eg change-points vs random-walk). In change-point models, rates spike at a changes and decay otherwise.
[1] papers.nips.cc/paper_files/...
In this study I use Master equations (commonly used in statistical physics) to derive analytical expressions for key observables. This approach could be very useful for studying learning dynamics of RL algorithms without having to run costly simulations.
In this study I use Master equations (commonly used in statistical physics) to derive analytical expressions for key observables. This approach could be very useful for studying learning dynamics of RL algorithms without having to run costly simulations.
Full paper: www.pnas.org/doi/10.1073/...
Full paper: www.pnas.org/doi/10.1073/...
This is not quite accurate. In the paper I show that there is a large class of temporal profiles of the learning rate (that are not Bayes optimal) that might lead to the appearance of bias.
This is not quite accurate. In the paper I show that there is a large class of temporal profiles of the learning rate (that are not Bayes optimal) that might lead to the appearance of bias.