coolstuffdude.bsky.social
@coolstuffdude.bsky.social
Very cool, i've been asking a similar twisted riddle, models are too good now

A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only the farmer and a single item. If left unattended together, the goat would eat the wolf, or the wolf would eat the cabbage.
December 13, 2025 at 11:36 PM
oooo, ok one theory

The `Context hygine:` section is hint for the model that does compaction. So this is saying "During compaction, summarize long sections from loaded skills content"

I actually don't think I've ever double checked if its the same model or a different one for compaction though.
December 13, 2025 at 5:43 AM
Hrm, any idea what it means by "summarize long sections instead of pasting them"

Referring to reciting information from the skills themselves to the user I guess? Or maybe related to the output of skills?
December 13, 2025 at 4:42 AM
Hot take: I think they're actively trying their hardest. There are probably 100 people full time working on it, throwing more cooks at it would probably make it take longer too. They're trying to nerf it as much as they can but still keep people on the platform. I'd prefer that over people on grok
November 7, 2025 at 5:13 AM
I would be very interested in seeing a "elder safety" benchmark, where you can evaluate models on topics like these. I'd be very interested to see how different scores are for for gpt-5 vs claude vs grok
November 7, 2025 at 4:40 AM
yep! like if the model is debugging a really verbose python test, instead of running python test.py and look for its output, it can run python test.py | grep "[TEST-ERR]"

It's more important with auto-compaction from claude, no matter what in context gets compacted, files still exist
October 21, 2025 at 2:40 AM
containers elegantly serve the role as state management for agents, because that's what computers were designed to do for humans! organize info, execute things, handle long running things

i've wanted to write about this for a while, and your post is so spot on that you inspired me :)
October 20, 2025 at 2:30 AM
I think we're going to see an arms race in containerized execution environments for models. it's not easy even for openai or anthropic to run 10 or 100MM containers

having it "just work" on the api is just so powerful though, we may see cloud providers have a real role to play here
October 20, 2025 at 2:29 AM
not the same but related, openai trains its models to manipulate images in their containers to be able to extract text or zoom in on images. it's basically skills. it also feels to me that the image manipulation is done via sub agent on chatgpt
October 20, 2025 at 2:29 AM
I think you're correct that skills > mcp, skills are inherently _composable_! your skill can write to files instead of blowing up context, they can be used with any unix commands which the model already understands, they can kick off background processes, we're just scratching the surface
October 20, 2025 at 2:18 AM
I would highly recommend doing this same thing with gpt-oss on their responses api with their web tool enabled. it gives you a good feeling of how good the model is once you actually give it the tools it was trained on. It is also very willing to talk about the tool output compared to closed models
October 20, 2025 at 2:10 AM
I appreciate you posting all of these, I always feel very up to date because of you! :)
October 15, 2025 at 4:04 AM
Theory: Maxwell just escaped, the person in prison is no longer Maxwell. Then the prison Maxwell dies and the real Maxwell is gone.
October 13, 2025 at 4:15 AM
link?
October 13, 2025 at 1:17 AM
Oh, thats probably why he wanted the nobel peace prize, he thought it would get him into heaven if he got it.
October 12, 2025 at 10:13 PM
I was pretty surprised that I could also hit escape, ask it to give me a status report on all of the agents, and then say continue and they would continue. Super cool!
October 11, 2025 at 10:02 PM
I came across your stuff because of Simon Willison, I have to say I REALLY like how you think about things with Claude!

Such an amazing resource to read through your blog, thank you :)
October 11, 2025 at 2:54 PM
if you have any questions about the model lmk, I've spent way too much time with it in vLLM, feel free to DM me
October 10, 2025 at 12:10 AM
Whatever option you go with, triple check that the tokens coming out look right when compared to the messages, the 'chat template' for gpt-oss is super different than any other model and most implementations are just broken
October 10, 2025 at 12:04 AM
If your cards work with it, vLLM may be a better bet if you're getting serious
October 9, 2025 at 11:58 PM
As much as I hate to say it, I think they're going to only pay federal workers in red states, or maybe not pay in Illinois, California and Oregon
October 8, 2025 at 1:26 AM
Or grok still searches his tweets live for how it should reply and he hopes this changes it, but that's too smart for him lol
September 17, 2025 at 2:08 PM
you see, he isn't smart enough to be in the same room as the engineers who actually build grok and who could push back on his ideas so he has to just declare things on twitter to force them to do it instead
September 17, 2025 at 2:08 PM