I've seen so many people simply asking Claude to "verify the code" and hoping that's enough.
Untested code is broken code.
Write. Tests.
2/2
I've seen so many people simply asking Claude to "verify the code" and hoping that's enough.
Untested code is broken code.
Write. Tests.
2/2
As soon as I did that, my code became cleaner and much more efficient.
As soon as I did that, my code became cleaner and much more efficient.
This might sound weird, but errors didn't make sense because I was thinking about agents the wrong way.
This might sound weird, but errors didn't make sense because I was thinking about agents the wrong way.
They are processing more than 10M documents, and the pipeline takes several hours to complete.
A significant amount of code was written here, and 0% of it was AI-generated.
4/4
They are processing more than 10M documents, and the pipeline takes several hours to complete.
A significant amount of code was written here, and 0% of it was AI-generated.
4/4
2. Remove unnecessary tokens
3. Remove low-quality data
4. Remove duplicated data
5. Replace personally identifiable information
6. Normalize data formats
7. Align multi-modal data
8. Enrich the data with domain-specific annotations
9. Tokenize the data
3/4
2. Remove unnecessary tokens
3. Remove low-quality data
4. Remove duplicated data
5. Replace personally identifiable information
6. Normalize data formats
7. Align multi-modal data
8. Enrich the data with domain-specific annotations
9. Tokenize the data
3/4
Here are the steps in their pipeline:
2/4
Here are the steps in their pipeline:
2/4
Except English, maybe.
Except English, maybe.
The speed at which you can write good Python code is way more important in most situations than the speed at which that code must run.
The speed at which you can write good Python code is way more important in most situations than the speed at which that code must run.
I'm not sure about you, but "mostly deterministic" is not the same as "deterministic."
9/9
I'm not sure about you, but "mostly deterministic" is not the same as "deterministic."
9/9
OpenAI's documentation tells us that we can expect "(mostly) deterministic outputs across API calls."
8/9
OpenAI's documentation tells us that we can expect "(mostly) deterministic outputs across API calls."
8/9
7/9
7/9
6/9
6/9
5/9
5/9
There are several reasons:
• Non-deterministic hardware ops can produce different outputs. Two GPU kernels can diverge slightly run to run. This may introduce slight differences in your outputs.
4/9
There are several reasons:
• Non-deterministic hardware ops can produce different outputs. Two GPU kernels can diverge slightly run to run. This may introduce slight differences in your outputs.
4/9
3/9
3/9