Find our paper here: openreview.net/pdf?id=EFrgB...
Find our paper here: openreview.net/pdf?id=EFrgB...
💻 We give explicit upper and lower bounds on the tail-index of the resulting parameter distribution and validate these bounds in numerical experiments.
💻 We give explicit upper and lower bounds on the tail-index of the resulting parameter distribution and validate these bounds in numerical experiments.
TL;DR: Heavy tailed parameter distributions can emerge from locally Gaussian gradient noise, as we show both theoretically and empirically.
It has repeatedly been observed that loss minimization by stochastic gradient descent (SGD) leads to heavy-tailed distributions of neural network parameters.
TL;DR: Heavy tailed parameter distributions can emerge from locally Gaussian gradient noise, as we show both theoretically and empirically.
It has repeatedly been observed that loss minimization by stochastic gradient descent (SGD) leads to heavy-tailed distributions of neural network parameters.