Felix Petersen
petersen.ai
Felix Petersen
@petersen.ai
Machine learning researcher @Stanford. https://petersen.ai/
Learn more in our paper (arxiv.org/abs/2410.23970) and check out our paper video on YouTube: youtu.be/ZjTAjjxbkRY
Computer Vision Models with LLM Training Dynamics (TrAct)
YouTube video by Felix Petersen
youtu.be
December 4, 2024 at 6:39 PM
...and it speeds up overall training by factors ranging from 1.25x (for large ViT pre-training) to 4x (for ConvNets).
We benchmark TrAct on a suite of 50 experimental settings.
December 4, 2024 at 6:39 PM
Our implementation is efficient, only modifies the gradient in the backward, and is compatible with various optimizers. To use *TrAct*, just wrap your first layer in a "TrAct" module...
December 4, 2024 at 6:39 PM
Thus, we can effectively train the first-layer activations of a Vision model, with updates similar to those in the LLM Embedding layer.
December 4, 2024 at 6:39 PM
We close this gap by proposing TrAct: we conceptually *Tr*ain *Act*ivations. While we can't train activations directly bc only weights are trainable, we formulate an optimization problem to find the optimal weights to match a GD step on the activations, and in closed-form modify the gradients resp.
December 4, 2024 at 6:39 PM
This means that learning of the vision model, at the first layer, is much slower than in LLMs, and that learning is actually faster on higher contrast regions of the image than in low contrast regions due to a proportionality between gradients of weights and input pixel values.
December 4, 2024 at 6:39 PM
The big difference between LLMs and Vision models lies in the first layer:
* in LLMs we update Embeddings (/activations) directly
* but in Vision models we update the *weights* of the first layer, which causes indirect updates to the Activations (/embeddings)
December 4, 2024 at 6:39 PM
Check out our 5 minute paper video on YouTube 🎥: www.youtube.com/watch?v=7aFP...
Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms - NeurIPS2024
YouTube video by Felix Petersen
www.youtube.com
November 28, 2024 at 1:49 AM
A big thanks to my co-authors Christian Borgelt, @tobiassutter.bsky.social @hildekuehne.bsky.social Oliver Deussen and Stefano Ermon.

Also a shout-out to the authors of the methods we build on: @qberthet.bsky.social @mblondel.bsky.social @marcocuturi.bsky.social @bachfrancis.bsky.social ky.social
November 28, 2024 at 1:49 AM
Newton losses is easy to implement, and it's empirical Fisher extension can be added to existing pipelines with a single call of `InjectFisher` between the model and the loss.
November 28, 2024 at 1:49 AM
In Newton Losses, we merge SGD training of NNs with a Newton step on the loss. This is crucial for algorithmic losses like ranking and graph losses, esp. w/ vanishing+exploding grads. Intuition: if the loss is harder to optim. than the NN, we should use a stronger optimization method for the loss.
November 28, 2024 at 1:49 AM
If you're excited about #AI with #logic, check out our fully animated video on YouTube: youtu.be/FKQfMwFZvIE
Convolutional Differentiable Logic Gate Networks - NeurIPS Oral - difflogic
YouTube video by Felix Petersen
youtu.be
November 17, 2024 at 4:34 PM