Sam Rose
banner
samwho.dev
Sam Rose
@samwho.dev
That guy who makes visual essays about software at https://samwho.dev.

Developer Educator @ ngrok.com. Want to pair on something ngrok related? Let's do it! https://cal.com/samwho/workhours

He/him.
Honestly, there’s no filter in it. They’re expensive, don’t last long, and I couldn’t tell the difference.
November 12, 2025 at 8:55 AM
I’m just glad I didn’t pour. Would be cleaning up shattered glass I imagine.
November 12, 2025 at 8:54 AM
I have a VSCode extension for doing this but I’ve shut my laptop for the night. I’ll leave myself a reminder to reply to this tomorrow.
November 11, 2025 at 11:15 PM
If you don’t want to wait, @sebastianraschka.com’s “Build a Large Language Model (From Scratch)” is an S-tier book.
November 11, 2025 at 11:17 AM
To be clear, Owen did 100% of the work on this. I read it and gave some suggestions, both in terms of the writing and the code behind the visuals. None of the code or writing is mine.
November 11, 2025 at 9:14 AM
Yeah, dug into the code and saw it comes from whoever they get gifs from. Still, it kinda feels worse than having no alt text? It’s very rarely accurate in a way that’s useful and it means if you have the “warn me if I try to post something without alt text” setting turned on it doesn’t trigger.
November 11, 2025 at 8:47 AM
github.com/bluesky-soci...

github.com/bluesky-soci...

Need to take the kids to the bus but the answer seems to be they get it from whatever service they get gifs from.
November 11, 2025 at 8:33 AM
“a group of young people are standing next to each other covering their faces with their hands .”

I mean technically? It’s the sort of complete unawareness of cultural context I’d expect of AI, but the spaces in wrong places is throwing me off.
a group of young people are standing next to each other covering their faces with their hands .
Alt: a group of young people are standing next to each other covering their faces with their hands .
media.tenor.com
November 11, 2025 at 8:27 AM
This one is “a close up of a man 's face in a suit” the errant space is weird and not something I’d expect of current models.
a close up of a man 's face in a suit
Alt: a close up of a man 's face in a suit
media.tenor.com
November 11, 2025 at 8:26 AM
I’d be delighted to be shown a feasible attack. Let me know what you find! I can’t see a way around needing to know the original prompt to do something meaningful with the KV cache.
November 11, 2025 at 7:52 AM
Good point yeah, also cheaper.
November 11, 2025 at 7:48 AM
The unpredictability would be in very rare cases you get a response faster than usual. The caching doesn’t affect anything about the specific response received, just the speed you get it. This would be a nice thing if you weren’t worried about timing attacks, no?
November 11, 2025 at 7:43 AM
When you get a fast response you know it must have been cached, thus asked before.

It’s such a big search space, though, complicated by not knowing where the cache boundaries have been set. OpenAI go by blocks of 1024, Anthropic let you set where the cache boundary is yourself.
November 11, 2025 at 7:39 AM
It’s entirely possible I’m wrong, I’m still new to the material but I’m trying to empirically puzzle through this problem of what can be attacked based on what I’ve read / observed about how these models work.

So far all I’ve got is timing attacks. You probe the model looking for fast responses.
November 11, 2025 at 7:39 AM
No, I’m 99% sure that won’t work because you’re missing the query matrices and to get those you need to know the prompt.
November 11, 2025 at 7:34 AM
Yes.
November 11, 2025 at 7:26 AM
The cache values alone isn’t enough state to generate anything, what’s in there gets combined with the query matrices of the tokens which needs to be created by combining the token embedding and w_q. So you’d still need to know the prompt to ask it what the prompt was.
November 11, 2025 at 7:26 AM
The cache key is just the prompt tokens, though. You can set it to whatever you want by using any prompt you want. It’s not clear to me how that leads to an attack. The value in the cache is a pure function of key/w_k and value/w_v. If you used values for a different key you’d just get gibberish.
November 11, 2025 at 7:24 AM
That would be amazing, thank you! ❤️

Could you walk me through how the attack you’re suggesting works? What specifically are you looking to “get” from the cache and how do you plan to use it?
November 11, 2025 at 7:15 AM