Santiago
svpino.com
Santiago
@svpino.com
I help companies build Machine Learning • I run http://ml.school. • Posts about what I learn along the way.
If I'm building new functionality, I ask Claude to propose new tests.

I've seen so many people simply asking Claude to "verify the code" and hoping that's enough.

Untested code is broken code.

Write. Tests.

2/2
October 2, 2025 at 12:15 PM
Agents are functions. It's much better to think of them as unreliable APIs that will give you the correct answer most of the time.

As soon as I did that, my code became cleaner and much more efficient.
October 1, 2025 at 3:02 PM
When you anthropomorphize agents, you can't see them for what they are.

This might sound weird, but errors didn't make sense because I was thinking about agents the wrong way.
October 1, 2025 at 3:02 PM
At this point, the data is ready for training.

They are processing more than 10M documents, and the pipeline takes several hours to complete.

A significant amount of code was written here, and 0% of it was AI-generated.

4/4
October 1, 2025 at 12:20 PM
1. Load raw data from 6 different plaves
2. Remove unnecessary tokens
3. Remove low-quality data
4. Remove duplicated data
5. Replace personally identifiable information
6. Normalize data formats
7. Align multi-modal data
8. Enrich the data with domain-specific annotations
9. Tokenize the data

3/4
October 1, 2025 at 12:20 PM
I wanted to outline the steps to give you an idea of what production applications typically deal with.

Here are the steps in their pipeline:

2/4
October 1, 2025 at 12:20 PM
I'll try to stick around.
September 30, 2025 at 5:36 PM
And no other programming language can match Python in code clarity and readability.

Except English, maybe.
September 30, 2025 at 3:10 PM
On top of that, over 90% of the Python code you interact with uses C/Rust/Fast-thing behind the scenes anyway.

The speed at which you can write good Python code is way more important in most situations than the speed at which that code must run.
September 30, 2025 at 3:10 PM
Gemini's documentation says that "responses for a given prompt are mostly deterministic, but a small amount of variation is still possible."

I'm not sure about you, but "mostly deterministic" is not the same as "deterministic."

9/9
September 30, 2025 at 1:04 PM
So, in practice, the LLM you are calling is non-deterministic, even at temperature = 0.

OpenAI's documentation tells us that we can expect "(mostly) deterministic outputs across API calls."

8/9
September 30, 2025 at 1:04 PM
• Your model provider will patch, retrain, and fine-tune the model you are using without telling you. What was true yesterday isn't guaranteed today. (I'm kinda cheating with this last point, because it's not related to the nature of models, but it's still a problem we have to deal with.)

7/9
September 30, 2025 at 1:04 PM
• If you are using a model scaled across multiple servers, you are not always hitting the same model instance, which increases the chances of finding issues with differences in hardware.

6/9
September 30, 2025 at 1:04 PM
• Floating-point operations are tricky, and even when using greedy decoding, a model might suffer from tiny numeric drifts that shift the outputs.

5/9
September 30, 2025 at 1:04 PM
Theoretically, temperature=0 makes a model deterministic. In practice, this is not the case.

There are several reasons:

• Non-deterministic hardware ops can produce different outputs. Two GPU kernels can diverge slightly run to run. This may introduce slight differences in your outputs.

4/9
September 30, 2025 at 1:04 PM
Yesterday, most replies tried to school me about the temperature parameter when using an LLM. Yes, you can set this parameter to 0 to try to force a model to return the same answer, but in practice, you'll find this isn't guaranteed.

3/9
September 30, 2025 at 1:04 PM