Lightnews — Scholar-powered news

codetalker7

@codetalker7.bsky.social

16 followers 4 following 8 posts

ai, math, open source. upcoming cs phd @utah.edu. prev ra @tifr.res.in and lcs2. family over everything.

Posts Replies Media Videos

codetalker7

@codetalker7.bsky.social

we show that CurDKV outperforms SOTA methods (including SnapKV and adaptive variants) even under large compression ratios, and simultaneously reduces generation latency by upto 40%. code and camera-ready versions to be released soon!

September 19, 2025 at 3:08 AM

codetalker7

@codetalker7.bsky.social

to that end, we propose CurDKV, a novel technique that selects the most important keys and values based on their combined "leverage scores", inspired by the CUR decomposition of a matrix (well-known in low-rank matrix approximation theory). (4/n)

September 19, 2025 at 3:08 AM

codetalker7

@codetalker7.bsky.social

while useful, this heuristic overlooks the fact that the final attention output of the attention module also involves "value" vectors. therefore, a good token eviction method should optimize for the combined contributions of key-value vectors. (3/n)

September 19, 2025 at 3:08 AM

codetalker7

@codetalker7.bsky.social

many SOTA kv compression methods (at the time of writing) relied heavily on "attention scores" to evict cached tokens from the kv matrix; this assumption is based on the fact that the most important tokens in the context have a higher attention score. (2/n)

September 19, 2025 at 3:08 AM

codetalker7

@codetalker7.bsky.social

moving forward, i'm excited to contribute to building the theoretical foundations of intelligent systems—and to make them more efficient, resource-optimal and secure. (2/2)

June 15, 2025 at 1:29 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news