SIMD-friends: #SSE, #AVX512, #NEON, #RVV
(Opinions are my own)
(not sure if that's a known trick)
Note: compilers optimize `((~mreset) | v)` so that it is only computed once.
(not sure if that's a known trick)
Note: compilers optimize `((~mreset) | v)` so that it is only computed once.
Here, the cpp is derived from the duration & frequency (i.e. it's meant to be a rough approximation of 'hardware cycles').
I do have 'proper cycles' measurements, but not at this level of granularity.
Here, the cpp is derived from the duration & frequency (i.e. it's meant to be a rough approximation of 'hardware cycles').
I do have 'proper cycles' measurements, but not at this level of granularity.
It is worth noting that only portions of the algorithm have been vectorized, and the performance of non-SIMD parts (which can be branch-heavy) will also vary.
It is worth noting that only portions of the algorithm have been vectorized, and the performance of non-SIMD parts (which can be branch-heavy) will also vary.
I'd like to thank everyone who has been involved in it (jury members, advisors, colleagues, and of course my family).
This has been a long adventure, and I'm now looking forward to the next thing.
I'd like to thank everyone who has been involved in it (jury members, advisors, colleagues, and of course my family).
This has been a long adventure, and I'm now looking forward to the next thing.