Curtis Bezault
cbezault.bsky.social
Curtis Bezault
@cbezault.bsky.social
Because their yields are atrocious, their cores run hot, and their AVX support is a worse than AMD’s. :)

There’s a whole lot more to making a good core than an advanced process.
September 15, 2025 at 3:43 PM
(At least in most major coastal US cities)
July 5, 2025 at 6:10 PM
If it can manage to make it through peak hours then it’s fine though right? Yes these are niche right now and there are probably “better” investments to be made in mass transit but at least there aren’t a hundred stakeholders you need to fight to get one of these running.
July 5, 2025 at 6:09 PM
The math is probably still in favor of hydrocarbons from a scheduling standpoint for longer haul trips but probably not forever. Faster charging batteries are getting better constantly.
July 5, 2025 at 4:37 PM
I think some of the hoped for advantage comes from superior speed over a traditional displacement hull. So you could fit in 15 minute charges within the same schedule as the diesel ferry. (Of course we could also have diesel/hydrocarbon hydrofoils)
July 5, 2025 at 4:36 PM
I wish I could do this :/. I’m stuck charging my EV while I’m at the office, during high-though-not-quite-peak demand.
June 26, 2025 at 11:17 PM
Not sure why you keep coming back to this but for me it’s the never ending struggle for a faster radix sort and Huffman encoding :)
June 23, 2025 at 1:46 PM
What are you talking about? There are literal troops on the ground in LA. No one is looting, no one is burning, people are just mad our state is being occupied.
June 8, 2025 at 10:28 PM
Wasn’t talking about the militarization of police forces at all. That’s clearly been a huge mistake and is a convenient/intentional end state to get a police force into if you want “socially acceptable” boots on the ground to be enforcers of power (state or corporate).
June 8, 2025 at 5:41 PM
Idk, people’s behavior is pretty influenced by tail risk. If there’s really no tail risk anymore…
June 8, 2025 at 5:24 PM
They weren’t but they at least thought there was a low but non-zero chance they’d face some consequences for their actions. Now they’re willing to be more egregious because they expect federal pardons/protection.
June 8, 2025 at 4:49 PM
We’ve managed to get down to 41uops. Three rounds of binning, the first we do in 7 then the next two at 8 (we can shave one uop per round off with some more work). Then a 18uop pospopcount on 32-bit wide counters. So 7+8+8+18.
I’m not particularly happy with all the store uops though.
May 7, 2025 at 11:38 PM
I seem to recall Ninentdo engineers being aware of this and intentionally deciding on allowing this exploit to keep people from losing their Pokémon
April 29, 2025 at 2:29 AM
How do you exit the escrow state if the trade is interrupted while both are in that state?
April 29, 2025 at 2:22 AM
Lmao at the quantum computing bit.
April 28, 2025 at 1:36 AM
Yeah I was thinking the peak on zen5 could be 0.25 cycles per byte but we’re not running any zen5 machines in production so I didn’t bother going any further.
How many uops do you have on your hot path? My code can’t go faster than 0.42 on ER based on uops.
February 22, 2025 at 6:00 AM
0.33 cycles per byte on Zen5
February 18, 2025 at 11:56 PM
Didn’t know about shldvq. Shaved one uop off with that. (No need to mask the byte we’re shifting by from the input for the first shift). Also gets rid of the mask like you said.
February 16, 2025 at 11:38 PM
To close this out I'm getting 0.43-0.44 cycles per byte on ER. I'm more than happy with that :)
February 16, 2025 at 8:30 PM
I’m sure you’re looking but also both gcc and clang were sometimes producing bad codegen.
February 16, 2025 at 2:51 PM
Different generations different details. The things we’ve done haven’t been universal wins. Jorn has mostly been looking at Ice Lake and I have been looking at ER. Trying to find a tuning that works out the best for both.
February 16, 2025 at 2:50 PM
Correction: 9p5 + 11p0 for binning. Maybe I’ll make those equal.
February 16, 2025 at 8:02 AM
The code as I have it now is 1 load uop, 4 store uops, 9 port5 uops and 10 port0 uops for binning.
1p23A + 10p0 + 10p5 + 14p05 for the histogramming. Should be able to run well under 0.5 cycles per byte but not quite there yet.
February 16, 2025 at 7:55 AM
Doing the single load did not appear to help on Ice Lake but it made a difference on Emerald Rapids. Fewer uops is always worth it.
February 16, 2025 at 7:18 AM