Lightnews — Scholar-powered news

Isaac

@isaac-gerber.bsky.social

33 followers 68 following 14 posts

Data science, AI, ML.

Sci fi and fantasy books and gaming

Posts Replies Media Videos

Isaac

@isaac-gerber.bsky.social

The key idea is to split up context encoding from query processing/token generation. The context encoding stage is then split across multiple blocks for parallel computation.

November 28, 2024 at 2:01 PM

Isaac

@isaac-gerber.bsky.social

There's a tunable parameter that lets you balance the tradeoff between accuracy and speed. And best of all, it operates on a different mechanism from other popular LLM optimization techniques like Flash Attention and KV cache compression meaning that you can combine them for more speed improvements.

November 28, 2024 at 2:01 PM

Isaac

@isaac-gerber.bsky.social

but are they too hot?

November 27, 2024 at 1:52 AM

Isaac

@isaac-gerber.bsky.social

for now

November 26, 2024 at 8:28 PM

Isaac

@isaac-gerber.bsky.social

true but is rothfuss ever going to finish it? i’d love you to be right!

November 25, 2024 at 1:54 AM

Isaac

@isaac-gerber.bsky.social

this seems sadly most likely to me too

November 25, 2024 at 1:53 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news