This project was an absolute blast to work on with @jpillowtime.bsky.social 🙏🧠. Stay tuned, we’ve got a lot more coming on this front, we’re just getting started! 👀
This project was an absolute blast to work on with @jpillowtime.bsky.social 🙏🧠. Stay tuned, we’ve got a lot more coming on this front, we’re just getting started! 👀
Does optimizing neuron parameter distributions using the GP help in the finite network regime?
Yes! ✅
Maximizing GP marginal likelihood yields better fits even when we return to small finite low-rank RNNs. ✨
The infinite-width theory has real practical payoff! 📈🔥
Does optimizing neuron parameter distributions using the GP help in the finite network regime?
Yes! ✅
Maximizing GP marginal likelihood yields better fits even when we return to small finite low-rank RNNs. ✨
The infinite-width theory has real practical payoff! 📈🔥
What happens as network width → ∞?
We show infinite-unit networks converge to a Gaussian Process over ODEs!
For certain nonlinearities, the GP covariance can be computed in closed form, letting us analyze and optimize the distribution that generates neuron params📊🔁
What happens as network width → ∞?
We show infinite-unit networks converge to a Gaussian Process over ODEs!
For certain nonlinearities, the GP covariance can be computed in closed form, letting us analyze and optimize the distribution that generates neuron params📊🔁
How many neurons are needed to approximate a given dynamical system?
We extend Orthogonal Matching Pursuit (sparse regression ideas) to identify the smallest low-rank network sufficient for the dynamics — resulting in dynamics-driven network design. 🎯✂️
How many neurons are needed to approximate a given dynamical system?
We extend Orthogonal Matching Pursuit (sparse regression ideas) to identify the smallest low-rank network sufficient for the dynamics — resulting in dynamics-driven network design. 🎯✂️
We adapt the NEF perspective to an online teacher-training setting for gradient-free training and see:
✅ Better performance than FORCE using many fewer neurons
✅ Outperform backprop-trained networks of similar size
✅ Substantially less training time required
We adapt the NEF perspective to an online teacher-training setting for gradient-free training and see:
✅ Better performance than FORCE using many fewer neurons
✅ Outperform backprop-trained networks of similar size
✅ Substantially less training time required
How do activation choice + param distributions shape RNN dynamics?
Big insight — Neural Engineering Framework: neurons act as random nonlinear basis functions approximating an ODE. This view gives geometric insight into how tanh vs ReLU affect representational capacity 🌀
How do activation choice + param distributions shape RNN dynamics?
Big insight — Neural Engineering Framework: neurons act as random nonlinear basis functions approximating an ODE. This view gives geometric insight into how tanh vs ReLU affect representational capacity 🌀