Randall Bennett
randallb.com
Randall Bennett
@randallb.com
builds
The previous term might be "prompt engineering" but that focuses on a single prompt. I'm thinking like, how do you chain prompts together so that an agent / assistant can find the information automatically?

There's an overlap with context engineer, so idk if it's different yet.
September 11, 2025 at 9:04 PM
The raw thought traces are legit triggering. And if a frontier lab trained them to be less triggering, the LLM would actually just be less useful, likely stupider, and potentially deceptive.
September 11, 2025 at 12:00 AM
For me, it was something I actually empathized with. It felt like the actual way anxiety feels. Like, ruminating over something you have very little control over. Feeling frustrated that you continue to do it. Etc.
September 11, 2025 at 12:00 AM
and most notably, when things go wrong it can be crazytown to read. I read a trace from an LLM that was confused, and in a loop, and it would say things like "I wish i could just stop saying...." and then it would write the same word 800,000 times.
September 11, 2025 at 12:00 AM
I've only seen a few, to be clear, and even the traces that DeepSeek puts out don't seem to be the actual "thought" traces. If you look at an instruct model and then put it in a sort of thought loop, it's probably more analogous to "real" traces... ie messy, confusing sometimes,
September 11, 2025 at 12:00 AM
I presume this is using the responses api, which means openai is going to benefit from some caching on their side, but also they can use the *real* thinking traces, and then keep those around even between messages.
September 10, 2025 at 9:02 PM
Additionally, again for Codex, being able to flip between thought modes (without the "think" keyword) has been great. The full transcript of what the model is seeing when it does its next pass is useful.
September 10, 2025 at 9:02 PM
It's a small thing, but the noticeable lack of sycophancy is huge. Not being complemented for every question and thought is actual cognitive load, and costs attention / activation energy for the user.
September 10, 2025 at 9:02 PM
Right now, I'm starting to see the optimization patterns emerge with GPT5. It feels VERY different than GPT 4. It also feels VERY different than any Claude gen 4 model.
September 10, 2025 at 9:02 PM
GPT5 feels more meh because it's actually good enough that people don't know how to push it yet. It's kind of like when early game consoles would have their first iteration of games and then developers learn how to build software and optimize for the hardware.
September 10, 2025 at 9:02 PM