Lightnews — Scholar-powered news

Felix Petersen

@petersen.ai

550 followers 25 following 18 posts

Machine learning researcher @Stanford. https://petersen.ai/

Posts Replies Media Videos

Felix Petersen

@petersen.ai

Join us at our poster session today, 11am-2pm, at East Exhibit Hall A-C *#1502*.

December 12, 2024 at 6:41 PM

Felix Petersen

@petersen.ai

...and it speeds up overall training by factors ranging from 1.25x (for large ViT pre-training) to 4x (for ConvNets).
We benchmark TrAct on a suite of 50 experimental settings.

December 4, 2024 at 6:39 PM

Felix Petersen

@petersen.ai

Our implementation is efficient, only modifies the gradient in the backward, and is compatible with various optimizers. To use *TrAct*, just wrap your first layer in a "TrAct" module...

December 4, 2024 at 6:39 PM

Felix Petersen

@petersen.ai

We close this gap by proposing TrAct: we conceptually *Tr*ain *Act*ivations. While we can't train activations directly bc only weights are trainable, we formulate an optimization problem to find the optimal weights to match a GD step on the activations, and in closed-form modify the gradients resp.

December 4, 2024 at 6:39 PM

Felix Petersen

@petersen.ai

This means that learning of the vision model, at the first layer, is much slower than in LLMs, and that learning is actually faster on higher contrast regions of the image than in low contrast regions due to a proportionality between gradients of weights and input pixel values.

December 4, 2024 at 6:39 PM

Felix Petersen

@petersen.ai

The big difference between LLMs and Vision models lies in the first layer:
* in LLMs we update Embeddings (/activations) directly
* but in Vision models we update the *weights* of the first layer, which causes indirect updates to the Activations (/embeddings)

December 4, 2024 at 6:39 PM

Felix Petersen

@petersen.ai

Have you ever wondered how training dynamics differ between LLMs 🖋️ and Vision 👁️ models? We explore this and close the gap between VMs and LLMs in our #NeurIPS2024 paper "TrAct: Making First-layer Pre-Activations Trainable".
Paper link 📜: arxiv.org/abs/2410.23970
Video link 🎥: youtu.be/ZjTAjjxbkRY
🧵

December 4, 2024 at 6:39 PM

Felix Petersen

@petersen.ai

Newton losses is easy to implement, and it's empirical Fisher extension can be added to existing pipelines with a single call of `InjectFisher` between the model and the loss.

November 28, 2024 at 1:49 AM

Felix Petersen

@petersen.ai

In Newton Losses, we merge SGD training of NNs with a Newton step on the loss. This is crucial for algorithmic losses like ranking and graph losses, esp. w/ vanishing+exploding grads. Intuition: if the loss is harder to optim. than the NN, we should use a stronger optimization method for the loss.

November 28, 2024 at 1:49 AM

Felix Petersen

@petersen.ai

Excited to share our #NeurIPS 2024 Oral, Convolutional Differentiable Logic Gate Networks, leading to a range of inference efficiency records, including inference in only 4 nanoseconds 🏎️. We reduce model sizes by factors of 29x-61x over the SOTA. Paper: arxiv.org/abs/2411.04732

November 17, 2024 at 4:34 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news