Pavel🥤🇺🇦
banner
reinsteam.bsky.social
Pavel🥤🇺🇦
@reinsteam.bsky.social
graphics/rendering/gpu performance @Xbox ATG // prev: Final Fantasy XV and ATD @Square Enix, NHL and CTG @EA // Opinions are mine //🥤in 🇯🇵 // 日本語OK
When doing insertion / search, it’s probably better not to do that with unique hash per thread (multiple hashes per wave) to avoid scattered reads/writes, and rather repurpose threads in a wave to do linear probing of the same hash value (and repeat the process for all unique hashes within a wave)
November 12, 2025 at 11:43 PM
The example makes use of 32-bit elements that hold flags + some data. For complex structures, one can repurpose data bits into an indirection index into an array holding structures (or multiple arrays with different structures, depending on frequency of access)
November 12, 2025 at 11:33 PM
There was a presentation from AMD covering some popular parallel primitives, and starting from slide 46 it covers linear probing and then an extension of it — bi-directional probing which helps search time at the expense of more complex insert:

gpu-primitives-course.github.io/sa-course-no...
gpu-primitives-course.github.io
November 12, 2025 at 11:28 PM
This visualization could be misleading if actual shaders after Z pre-pass still have “discard / clip” in them but RenderDoc replaces with “quad overdraw” w/o discard/clip. TL;DR — a friendly reminder to check actual shaders don’t have “discard/clip”.
October 24, 2025 at 4:27 AM