Lightnews — Scholar-powered news

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

Very cool, i've been asking a similar twisted riddle, models are too good now

A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only the farmer and a single item. If left unattended together, the goat would eat the wolf, or the wolf would eat the cabbage.

December 13, 2025 at 11:36 PM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

oooo, ok one theory

The `Context hygine:` section is hint for the model that does compaction. So this is saying "During compaction, summarize long sections from loaded skills content"

I actually don't think I've ever double checked if its the same model or a different one for compaction though.

December 13, 2025 at 5:43 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

Hrm, any idea what it means by "summarize long sections instead of pasting them"

Referring to reciting information from the skills themselves to the user I guess? Or maybe related to the output of skills?

December 13, 2025 at 4:42 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

Hot take: I think they're actively trying their hardest. There are probably 100 people full time working on it, throwing more cooks at it would probably make it take longer too. They're trying to nerf it as much as they can but still keep people on the platform. I'd prefer that over people on grok

November 7, 2025 at 5:13 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

I would be very interested in seeing a "elder safety" benchmark, where you can evaluate models on topics like these. I'd be very interested to see how different scores are for for gpt-5 vs claude vs grok

November 7, 2025 at 4:40 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

yep! like if the model is debugging a really verbose python test, instead of running python test.py and look for its output, it can run python test.py | grep "[TEST-ERR]"

It's more important with auto-compaction from claude, no matter what in context gets compacted, files still exist

October 21, 2025 at 2:40 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

containers elegantly serve the role as state management for agents, because that's what computers were designed to do for humans! organize info, execute things, handle long running things

i've wanted to write about this for a while, and your post is so spot on that you inspired me :)

October 20, 2025 at 2:30 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

I think we're going to see an arms race in containerized execution environments for models. it's not easy even for openai or anthropic to run 10 or 100MM containers

having it "just work" on the api is just so powerful though, we may see cloud providers have a real role to play here

October 20, 2025 at 2:29 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

not the same but related, openai trains its models to manipulate images in their containers to be able to extract text or zoom in on images. it's basically skills. it also feels to me that the image manipulation is done via sub agent on chatgpt

October 20, 2025 at 2:29 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

I think you're correct that skills > mcp, skills are inherently _composable_! your skill can write to files instead of blowing up context, they can be used with any unix commands which the model already understands, they can kick off background processes, we're just scratching the surface

October 20, 2025 at 2:18 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

I would highly recommend doing this same thing with gpt-oss on their responses api with their web tool enabled. it gives you a good feeling of how good the model is once you actually give it the tools it was trained on. It is also very willing to talk about the tool output compared to closed models

October 20, 2025 at 2:10 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

I appreciate you posting all of these, I always feel very up to date because of you! :)

October 15, 2025 at 4:04 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

Theory: Maxwell just escaped, the person in prison is no longer Maxwell. Then the prison Maxwell dies and the real Maxwell is gone.

October 13, 2025 at 4:15 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

link?

October 13, 2025 at 1:17 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

Oh, thats probably why he wanted the nobel peace prize, he thought it would get him into heaven if he got it.

October 12, 2025 at 10:13 PM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

I was pretty surprised that I could also hit escape, ask it to give me a status report on all of the agents, and then say continue and they would continue. Super cool!

October 11, 2025 at 10:02 PM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

I came across your stuff because of Simon Willison, I have to say I REALLY like how you think about things with Claude!

Such an amazing resource to read through your blog, thank you :)

October 11, 2025 at 2:54 PM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

if you have any questions about the model lmk, I've spent way too much time with it in vLLM, feel free to DM me

October 10, 2025 at 12:10 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

Whatever option you go with, triple check that the tokens coming out look right when compared to the messages, the 'chat template' for gpt-oss is super different than any other model and most implementations are just broken

October 10, 2025 at 12:04 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

If your cards work with it, vLLM may be a better bet if you're getting serious

October 9, 2025 at 11:58 PM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

As much as I hate to say it, I think they're going to only pay federal workers in red states, or maybe not pay in Illinois, California and Oregon

October 8, 2025 at 1:26 AM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

Or grok still searches his tweets live for how it should reply and he hopes this changes it, but that's too smart for him lol

September 17, 2025 at 2:08 PM

coolstuffdude.bsky.social

@coolstuffdude.bsky.social

you see, he isn't smart enough to be in the same room as the engineers who actually build grok and who could push back on his ideas so he has to just declare things on twitter to force them to do it instead

September 17, 2025 at 2:08 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news