Lightnews — Scholar-powered news

Samet Oymak

@oymak.bsky.social

520 followers 88 following 11 posts

EECS Prof @UMich, Research on the Foundations of ML+RL+LLM
https://sota.engin.umich.edu/

Posts Replies Media Videos

Samet Oymak

@oymak.bsky.social

I was actually discussing SimPO a few weeks ago in my LLM class. Solid work!

December 3, 2024 at 5:37 PM

Samet Oymak

@oymak.bsky.social

All credits go to my amazing students :)

November 21, 2024 at 10:19 PM

Samet Oymak

@oymak.bsky.social

Our method uniformly improves language modeling evals with negligible compute overhead. During evals, we just plug in SSA and don't touch hyperparams/architecture so there is likely further headroom.

November 21, 2024 at 10:19 PM

Samet Oymak

@oymak.bsky.social

We can also see the approximation benefit directly from the quality/sharpness of the attention maps.

November 21, 2024 at 10:19 PM

Samet Oymak

@oymak.bsky.social

Why is this useful? Consider the tokens "Hinton" and "Scientist". These have high cosine similarity but we wish to assign them different spikiness levels. We show that this is provably difficult to achieve for vanilla attention, namely its weights have to grow much larger compared to our method.

November 21, 2024 at 10:19 PM

Samet Oymak

@oymak.bsky.social

The method adds a temperature-scaling (scalar gating) after K/Q/V embeddings and before softmax. Temperature is a function of the token embedding and its position. Notably, this can be done by - fine-tuning rather than pretraining - using very few additional parameters

November 21, 2024 at 10:19 PM

Samet Oymak

@oymak.bsky.social

The intuition is that specific tokens like "Hinton" should receive a spikier attention map compared to generalist tokens like "Scientist". Learning token-dependent temperatures with this results in the colormap above where (arguably) more specific words receive low temperatures.

November 21, 2024 at 10:19 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news