Lightnews — Scholar-powered news

Cyrus Rashtchian

@cyroid.bsky.social

300 followers 110 following 13 posts

Researcher at Google. Improving LLM factuality, RAG and multimodal alignment and evaluation. San Diego. he/him ☀️🌱🧗🏻🏐 Prev UCSD, MSR, UW, UIUC.

Posts Replies Media Videos

Cyrus Rashtchian

@cyroid.bsky.social

[6/6] The other idea is to do the weighted combination at an instance level. We look at intermediate layers for *each token* and slightly modify the overall distribution. This leads to consistent accuracy improvements for many models and datasets!

Would love to see some theory on why this works!

December 13, 2024 at 6:43 PM

Cyrus Rashtchian

@cyroid.bsky.social

[5/6] Here's a nice example. We want to do some math. Greedy decoding leads to 5 x $10 = $50 for the overtime pay. This is cus A x B = C is a common pattern. But we really need A x B x C = D to get the answer. SLED can help with this because the internal layers happen to predict 'x' instead of '='.

December 13, 2024 at 6:43 PM

Cyrus Rashtchian

@cyroid.bsky.social

[4/6] Our main decoding trick is to use a weighted combination of *all of the layers*. Precisely, we project the layers into the same output distribution (over vocab tokens). Then we combine the intermediate "logits" with the output logits based on our estimate of the LLM's internal knowledge

December 13, 2024 at 6:43 PM

Cyrus Rashtchian

@cyroid.bsky.social

[3/6] The key observation is that LLMs "know" a lot more than they "tell" -- basically the training process can favor more popular tokens (in the dataset) rather than more accuracy predictions for the query at hand.

So we can utilize this during decoding time...

December 13, 2024 at 6:43 PM

Cyrus Rashtchian

@cyroid.bsky.social

[2/6] Joint work with Jianyi Zhang · Da-Cheng Juan · Chun-Sung Ferng · Heinrich Jiang · Yiran Chen

ArXiv paper: arxiv.org/abs/2411.02433
Project page: jayzhang42.github.io/sled_page/
GitHub: github.com/JayZhang42/S...

But how does it work you ask?

December 13, 2024 at 6:43 PM

Cyrus Rashtchian

@cyroid.bsky.social

ArXiv paper: arxiv.org/abs/2411.02433
Project page: jayzhang42.github.io/sled_page/
GitHub: github.com/JayZhang42/S...

December 13, 2024 at 5:55 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news