meshoptimizer, pugixml, volk, calm, niagara, qgrep, Luau
https://github.com/zeux
https://zeux.io
There is some latency penalty to read the discriminator to decode the instance, and the resulting offset for actual data might make resulting unaligned loads not ideal. I'm wondering if it makes sense to have baseline alignment - e.g. 16 bytes - and encode id into lower bits of instance ref.
There is some latency penalty to read the discriminator to decode the instance, and the resulting offset for actual data might make resulting unaligned loads not ideal. I'm wondering if it makes sense to have baseline alignment - e.g. 16 bytes - and encode id into lower bits of instance ref.
While I *can* make the green bars completely solid I've already spent way longer than I should on this exercise so this will have to do!
While I *can* make the green bars completely solid I've already spent way longer than I should on this exercise so this will have to do!
2080 had 6 GPCs @ 1.5 GHz, 5070 has 5 GPCs @ 2.3 GHz. So just in general fairly close, as long as they didn't increase tri/GPC rate.
2080 had 6 GPCs @ 1.5 GHz, 5070 has 5 GPCs @ 2.3 GHz. So just in general fairly close, as long as they didn't increase tri/GPC rate.
And probably from the architectural perspective, pure rasterization bottlenecks have been squeezed dry 7 years ago and there's not much else to do, and not much need - 19B/sec is enough
And probably from the architectural perspective, pure rasterization bottlenecks have been squeezed dry 7 years ago and there's not much else to do, and not much need - 19B/sec is enough
Of course, when people say "rasterization", they usually mean modern ALU heavy rendering pipelines - not pure geometry stress test. Still!
Caveat: no 2080 to retest again :)
Of course, when people say "rasterization", they usually mean modern ALU heavy rendering pipelines - not pure geometry stress test. Still!
Caveat: no 2080 to retest again :)
- Curiously, the commit log said it ran at ~19B tri/sec on RTX 2080. On my RTX 5070 now, I get ~17B tri/sec on the same mesh. 5070 is 250W, my 2080 was a 215W model.
- Curiously, the commit log said it ran at ~19B tri/sec on RTX 2080. On my RTX 5070 now, I get ~17B tri/sec on the same mesh. 5070 is 250W, my 2080 was a 215W model.