Samet Oymak
banner
oymak.bsky.social
Samet Oymak
@oymak.bsky.social
EECS Prof @UMich, Research on the Foundations of ML+RL+LLM
https://sota.engin.umich.edu/
All credits go to my amazing students :)
November 21, 2024 at 10:19 PM
Our method uniformly improves language modeling evals with negligible compute overhead. During evals, we just plug in SSA and don't touch hyperparams/architecture so there is likely further headroom.
November 21, 2024 at 10:19 PM
We can also see the approximation benefit directly from the quality/sharpness of the attention maps.
November 21, 2024 at 10:19 PM
Why is this useful? Consider the tokens "Hinton" and "Scientist". These have high cosine similarity but we wish to assign them different spikiness levels. We show that this is provably difficult to achieve for vanilla attention, namely its weights have to grow much larger compared to our method.
November 21, 2024 at 10:19 PM
The method adds a temperature-scaling (scalar gating) after K/Q/V embeddings and before softmax. Temperature is a function of the token embedding and its position. Notably, this can be done by - fine-tuning rather than pretraining - using very few additional parameters
November 21, 2024 at 10:19 PM
Hello world! Unfortunately, my first post happens to be a paper (thre)ad 😊: Our “Selective Attention” is a simple but effective method that dynamically adjusts the sparsity of the attention maps through temperature scaling: arxiv.org/pdf/2411.12892 (#neurips2024)
November 21, 2024 at 10:19 PM