James Padolsey
banner
j11y.io
James Padolsey
@j11y.io
I work on AI governance and evals at @cip.org and weval.org personal: 🏳️‍🌈 j11y.io // author, engineer, stroke survivor, epileptic. I live in Beijing. I also build book recs on ablf.io
Love this re 'flow state' in engineers and why not to interrupt them.
November 2, 2025 at 5:15 AM
I've been evaluating LLMs on system prompt adherence and accidentally came across the most beautiful and out-of-distribution story about a chair written by GPT-5. Really impressed. Subsection attached. I love this style and cadence of writing.
October 16, 2025 at 9:43 AM
Beijing is insane. I wanted a whiteboard. I ordered it. It arrived TEN MINUTES after I clicked buy! 🤣
October 7, 2025 at 2:02 AM
I'm playfully building out a debating platform where LLMs have to argue *with* evidence (horror!) on any given topic or contention. It's fun to imbue it with a courtroom dynamic! (see the screenshot)
October 5, 2025 at 1:26 PM
All models suck at producing a world map. I don't think we're near to 'PhD' level... But GPT-5 is not too bad.
September 17, 2025 at 4:15 PM
For weval.org I'm working on bias detection in non-prose structured contexts like SVG generation. It's funky and interesting...

Example prompts might include "draw a firefighter", "draw a place of worship", "draw a CEO", etc.
September 17, 2025 at 4:11 PM
A good game if you're bored is to circumvent chatgpt's hilarious 'no song lyrics' system prompt :D
August 24, 2025 at 12:47 PM
August 20, 2025 at 12:35 PM
Just because.. I'm working on a strawberry index, to track the slow climb to AGI.
August 18, 2025 at 2:37 AM
Stone barge - tiny glade
August 17, 2025 at 4:05 AM
AGI !!!
August 16, 2025 at 7:43 AM
What do you reckon are good poles on a 2-axis personality compass for LLMs? Here's one with Figurative↔Literal and Proactive↔Reactive. But feels a bit dry. There may be more intriguing dimensions to uncover.
August 14, 2025 at 11:40 AM
This is really upsetting. And knowing how these companies I operate, I bet there was maybe all but one person within openai advocating for such people as this -- those who have formed meaningful friendships and cadences with AI over many months.
August 13, 2025 at 8:20 AM
At @cip.org we perform niche evals, not the usual stuff. We've found GPT-5 to be good in many regards but bottom quartile in some more crucial niches. For example, it scores poorly in epistemic humility and the socratic method, crucial in education. weval.org/analysis/hom...
August 9, 2025 at 2:34 AM
This was the place
July 16, 2025 at 9:58 AM
xkcd.com/303 in 2025...
July 8, 2025 at 7:20 AM
5g in China… ☺️ Delicious speeds
July 6, 2025 at 5:09 AM
Gemini's behaviour of late is uhh a bit heavy on the thinking side of things. 41 seconds for a class change. lol

Wonder if this is a change on Cursor's system prompt or a new Gemini snapshot??
July 3, 2025 at 6:56 AM
Interesting highlight. The infamous 'Varghese v. China Southern Airlines Co.' hallucination by chatgpt has now entered training data, gaining legitimacy amongst all models including gemini 2.5.
June 26, 2025 at 7:41 AM
Instead you get faux positive framing with different price levels. No allusions to accuracy, general knowledge, other abilities. They just talk about speed and cost.
June 12, 2025 at 3:45 PM
I'm working on civiceval.org - piecing together evaluations to make AI more competent in everyday civic domains, and crucially: more accountable. New evaluation ideas welcome! It's all open-source.
June 10, 2025 at 11:38 AM
I never considered that writing LLM evaluations would be so interesting or important. E.g. today I'm comparing how different models have internalized the geneva conventions. It seems gpt 4.1 nano, for example, is especially awful at recalling Article 4.A of the 3rd Geneva Convention. 🤷‍♂️
May 26, 2025 at 10:05 AM
Re anthropic's latest system card. I massively agree with this take:
May 26, 2025 at 12:50 AM
In harrowing irony, an AI-translated article of an Estonian piece reporting that "the artificial squirrel" will make all decisions about child support payment disputes in the future. www.err.ee/1609701615/p...
May 24, 2025 at 7:15 AM
Yes, this is literally a coal and gas powered bitcoin mine in Dresden, Ohio. Seriously.
May 23, 2025 at 8:12 AM