jayk56.bsky.social
@jayk56.bsky.social
the awkward part of participating in #nokings when you're from sac..
June 15, 2025 at 4:16 PM
help! i just asked claude opus to build bloxorz and it's begun optimizing it in a dev loop. it hasn't let me test since version 4...
May 22, 2025 at 7:46 PM
Another fun fact is the new Gemini 2.5 Flash (non-thinking) preview model performs on par with Claude 3.7 Sonnet (non-thinking) on this benchmark for 1/10th the price...
May 22, 2025 at 5:57 AM
Gemini 2.5 Pro is a beast for the price and, given 8 tries per problem, scores 93% on the Aider benchmark. o4-mini (high) was not far behind, but cost about 30% more to score 91%. Sonnet 3.7 without thinking is able to average 87% but cost about $45 per run (more than double the cost of gemini)
May 13, 2025 at 7:58 PM
Almost... follow here to get the next installment in this saga
May 8, 2025 at 2:17 AM
what do we think? line go up? #aider
May 7, 2025 at 12:01 AM
Hundreds of SacTown locals showing up today to protect veterans, retirees, and science and protest oligarchs carving up the American public's wealth. let's turn those honks into action with the Hands OFF! protest April 5th.

#SacTeslaTakedown #TeslaTakedown #StandUpForScience #HandsOff
March 29, 2025 at 10:56 PM
I might be struggling with prompting, but o3-mini-high provided documents 1-3 and o1 provided document 4. Here is a summary of what I was noticing (as stated by o3-mini-high grading the responses)
January 31, 2025 at 9:31 PM
OpenAI's Realtime API feels heads and shoulders above Gemini 2.0 Flash live, but it's also possible it's just easier to get working examples going. I was comparing an example react component with 4o-mini against Google's example python cookbook with gemini 2.0 flash and flash just felt shallow..
December 24, 2024 at 11:58 PM