RPNX
rpnx.bsky.social
RPNX
@rpnx.bsky.social
SWE in Machine Learning GPU Systems Software at Google.

In my free time, lately I am attempting building a new programming language.

My interests include Linux, C++, operating systems, GPU programming, optimization, multithreading, and Go.
It may seem like copying into automatic variables creates a memory write, but the compiler's optimization pass is much more likely to be able to hoist these into registers when it can see the address of the object is never taken. Register copies are super fast!
March 1, 2025 at 8:02 PM
(contd.)

You should copy your inputs into automatic (stack) variables before you start mathing and doing writes. If you write to main memory during a sequence of calculations followed by a load, the CPU needs to stall for the write buffer to see if a previous write ends up at the same address.
March 1, 2025 at 7:57 PM
I've been an AMD fan for a while. I've had too many issues with Linux drivers on NVidia, even if they have faster raytracing and stuff. My personal rigs are almost all AMD. I have one external GPU for CUDA.
February 28, 2025 at 5:51 PM