PITTI
banner
pitti.io
PITTI
@pitti.io
Just trying to kill boredom without killing anyone in the process | Anything unrelated to actual (super niche) area of expertise | Dubito ergo sum
Anthropic’s recent evals for the release of Sonnet 4 and Opus 4 represent a good opportunity to re-share a blogpost from last year entitled “Artificial Intelligence : what everyone can agree on”

It touched on benchmark-gaming
May 24, 2025 at 5:20 PM
It’s helpful when you know exactly what you need… and when what you need is reasonably simple … and you know how to fix the buttons that do not work and the design.
I may use the as a starting point for something else actually

It feels more like early Replit than Cursor
May 24, 2025 at 4:09 PM
I gave it another chance where I know that React Typescript could be a good option (and I’ve kind of worked on this already so I can judge the choices made)
May 24, 2025 at 4:09 PM
I’ve been testing the new Google AI Studio feature “Build” as a promising mitigant to Gemini’s code slop tendency (which the AI Studio UI makes a horrible experience)

Unfortunately this is very far from ready to do actual work

Strongly advise to wait a couple of weeks (details below)
May 24, 2025 at 4:09 PM
May 24, 2025 at 4:01 PM
I had this exact discussion yesterday with lawyers and judges. I think they all know that it’s over. In Europe there is a sensitivity about personal data for reasons that I can understand (historically to protect from the governments!) but the solution seems to educate users
November 28, 2024 at 12:40 AM
Hahaha
November 27, 2024 at 9:34 PM
I was yesterday at a symposium organized by the Institut Presaje (which I actively support) at the Cour d’appel de Paris for a discussion on deployed AI.

What’s particularly interesting about this sector is that they approach AI from the bottom (users) and the top (legal/regulatory oversight)
November 27, 2024 at 8:12 PM
Gemini is not a very helpful assistant here

Major grammatical error, I’m not sure I’ve ever seem something like this
November 26, 2024 at 11:09 AM
One of the byproducts of my attempt to find appropriate settings for the entropix sampler with different MLX models is that I ended up replicating parts of the “softmax is not enough” paper across several models.

Qwen2.5 0.5B is very prone to that, much more than the larger Qwen2.5 models
November 26, 2024 at 12:01 AM