My work focuses on learning dynamics in biologically plausible neural networks. #NeuroAI
In this latter regime where only one rule-flipped synapse was introduced in the readout layer, curl descent can speed up learning in nonlinear networks. This result holds over a wide range of hyper-parameters.
In this latter regime where only one rule-flipped synapse was introduced in the readout layer, curl descent can speed up learning in nonlinear networks. This result holds over a wide range of hyper-parameters.
The story is completely different for the readout layer. Surprisingly, even when curl terms make the solution manifold unstable, the network is still able to find other low-error regions!
The story is completely different for the readout layer. Surprisingly, even when curl terms make the solution manifold unstable, the network is still able to find other low-error regions!
But beware! If you add too many rule-flipped neurons in the hidden layer of a compressive network, the learning dynamics spiral into chaos, thereby destroying performance. The weights never settle, and the error stays high.
But beware! If you add too many rule-flipped neurons in the hidden layer of a compressive network, the learning dynamics spiral into chaos, thereby destroying performance. The weights never settle, and the error stays high.
How does this scale up? We used random matrix theory to find that stability depends critically on network architecture. Expansive networks (input layer > hidden layer) are much more robust to curl terms, maintaining stable solutions even with many rule-flipped neurons.
How does this scale up? We used random matrix theory to find that stability depends critically on network architecture. Expansive networks (input layer > hidden layer) are much more robust to curl terms, maintaining stable solutions even with many rule-flipped neurons.
Toy example : In a tiny 2-synapse network, curl descent can escape saddle points and converge faster than gradient descent by temporarily ascending the loss function. But it comes at a cost: half of the optimal solutions become unstable.
Toy example : In a tiny 2-synapse network, curl descent can escape saddle points and converge faster than gradient descent by temporarily ascending the loss function. But it comes at a cost: half of the optimal solutions become unstable.
But can networks with such non-gradient learning dynamics still support meaningful optimization? We answer this question by focusing on an analytically tractable teacher-student framework, with 2-layer feedforward linear networks.
But can networks with such non-gradient learning dynamics still support meaningful optimization? We answer this question by focusing on an analytically tractable teacher-student framework, with 2-layer feedforward linear networks.
This is motivated by the diversity observed in the brain. A given weight update signal can have an opposite effects on a network's computation depending on the postsynaptic neuron (e.g. E/I), which is inconsistent with standard gradient descent.
This is motivated by the diversity observed in the brain. A given weight update signal can have an opposite effects on a network's computation depending on the postsynaptic neuron (e.g. E/I), which is inconsistent with standard gradient descent.
We define the curl descent learning rule by flipping the sign of the updates given by gradient descent for some weights. These weights are chosen at the start of learning depending on the nature (rule-flipped or not) of the presynaptic neuron.
We define the curl descent learning rule by flipping the sign of the updates given by gradient descent for some weights. These weights are chosen at the start of learning depending on the nature (rule-flipped or not) of the presynaptic neuron.
⚫ The increased dimensionality under HFS was accompanied by a decrease in generalization performance, both at the neural and behavioral levels.
⚫ The increased dimensionality under HFS was accompanied by a decrease in generalization performance, both at the neural and behavioral levels.
⚫ HFS led to higher dimensionality in neural activity, indicating a loss of structure in the neural representations, which is crucial for efficient motor learning and adaptation.
⚫ HFS led to higher dimensionality in neural activity, indicating a loss of structure in the neural representations, which is crucial for efficient motor learning and adaptation.
⚫ This compensation involved an angular shift in neural activity, suggesting a "re-aiming" strategy to handle the force field in the absence of cerebellar control.
⚫ This compensation involved an angular shift in neural activity, suggesting a "re-aiming" strategy to handle the force field in the absence of cerebellar control.
⚫ Under high-frequency stimulation (HFS), we observed a bigger difference between FF and null field (NF) neural activity, indicating a compensatory mechanism in the motor cortex to adapt to the perturbation.
⚫ Under high-frequency stimulation (HFS), we observed a bigger difference between FF and null field (NF) neural activity, indicating a compensatory mechanism in the motor cortex to adapt to the perturbation.
⚫ Under high-frequency stimulation (HFS), neural activity was altered in both a target-dependent and independent manner, showing that cerebellar signals contain task-related information.
⚫ Under high-frequency stimulation (HFS), neural activity was altered in both a target-dependent and independent manner, showing that cerebellar signals contain task-related information.
⚫ Cerebellar Block Impairs Adaptation: Blocking cerebellar outflow thanks to high-frequency stimulations (HFS) in the superior cerebellar peduncle significantly impairs force field (FF) adaptation, leading to increased motor noise and reduced error sensitivity.
⚫ Cerebellar Block Impairs Adaptation: Blocking cerebellar outflow thanks to high-frequency stimulations (HFS) in the superior cerebellar peduncle significantly impairs force field (FF) adaptation, leading to increased motor noise and reduced error sensitivity.