codetalker7
codetalker7.bsky.social
codetalker7
@codetalker7.bsky.social
ai, math, open source. upcoming cs phd @utah.edu. prev ra @tifr.res.in and lcs2. family over everything.
we show that CurDKV outperforms SOTA methods (including SnapKV and adaptive variants) even under large compression ratios, and simultaneously reduces generation latency by upto 40%. code and camera-ready versions to be released soon!
September 19, 2025 at 3:08 AM
to that end, we propose CurDKV, a novel technique that selects the most important keys and values based on their combined "leverage scores", inspired by the CUR decomposition of a matrix (well-known in low-rank matrix approximation theory). (4/n)
September 19, 2025 at 3:08 AM
while useful, this heuristic overlooks the fact that the final attention output of the attention module also involves "value" vectors. therefore, a good token eviction method should optimize for the combined contributions of key-value vectors. (3/n)
September 19, 2025 at 3:08 AM
many SOTA kv compression methods (at the time of writing) relied heavily on "attention scores" to evict cached tokens from the kv matrix; this assumption is based on the fact that the most important tokens in the context have a higher attention score. (2/n)
September 19, 2025 at 3:08 AM
moving forward, i'm excited to contribute to building the theoretical foundations of intelligent systems—and to make them more efficient, resource-optimal and secure. (2/2)
June 15, 2025 at 1:29 AM