Ashton Six
ashtonsix.com
Ashton Six
@ashtonsix.com
Research Engineer (software), with interests in superoptimisation, fast integer compression, and indexing for OLAP
I got SOTA (L1-hot, SIMD) on prefix sum by ADDING instructions (7.7 GB/s → 19.8 GB/s). Consider:

for i = 0..n: out[i] = out[i-1] + in[i]

This SUCKS, because out[i] must wait on out[i-1]. There's an unbroken dependency chain which disrupts Instruction Level Parrallelism (ILP). 1/
January 17, 2026 at 12:55 AM