dzaima.bsky.social
@dzaima.bsky.social
Clang seems to be able to assemble "push r16" & co, and "sde -future" runs those just fine, with unaligned rsp and all.
August 1, 2025 at 9:36 PM
vpclmulqdq is hard to beat. RISC-V has some fun ones, e.g. vsuxseg7ei16.v. It even extends the fun to extension names, e.g. Zvfbfwma (which contains the vfwmaccbf16.vv and vfwmaccbf16.vf instructions, though those are slightly less unreadable).
May 12, 2025 at 1:39 AM
RISC-V already has some standard overloaded opcodes (some compressed instrs for embedded ↔ compressed FP load/store iirc); and even if no such standard ones existed, such can still happen with custom vendor extensions (trivial consequence of an open ISA, like it or not).
April 1, 2025 at 4:50 AM
If you add a dot to make pclmul.qdq, it's not far off from RISC-V's equivalent of vclmul.vv (qdq and vv are info on entirely different aspects, but eh).

Granted, figuring out where to add that dot is quite important but rather non-trivial, and if x86 actually had such it'd be much more readable.
March 3, 2025 at 5:43 PM
Some summarizing (in the context of the array language BQN; also including less-than-scan (i.e. the scan for determining backslash escapes a la simdjson) & ≤-scan), written up by Marshall Lochbaum is now at mlochbaum.github.io/BQN/implemen....
BQN: Implementation of Fold and Scan
mlochbaum.github.io
February 18, 2025 at 9:29 PM
via some SMTing:
and-scan (2 instrs with andn): v &~ (v+1)
seg-and-scan (4 instrs with andn): t = (v &~ m) >> 1; (v - t) ^ t
February 17, 2025 at 4:42 AM
Just "dzaima" is fine.
January 2, 2025 at 1:39 PM
AFAICT the final code handles FMAdd(1e200, 1e200, -inf) incorrectly - the base a*b+c ends up as NaN, returned via the IsInfiniteOrNaN (and bias would be 1.0 as testResult is NaN), but the result should be -inf.
January 2, 2025 at 12:05 PM