Lightnews — Scholar-powered news

Gegell

@gegellibu.bsky.social

All of the previous implementations & an exhaustive test suite which verifies all 2^16 inputs is available on shadertoy: www.shadertoy.com/view/wfVyz3

www.shadertoy.com

December 6, 2025 at 10:12 PM

Gegell

@gegellibu.bsky.social

Also it turns out that for the 3D variant the additional space between the active bits can be used even better than for the 2D version, reducing the combine step of morton values from a 512^3 grid to only 5 operations!

And for Part1By2 we have enough space to also use the multiplies there.

December 6, 2025 at 10:12 PM

Gegell

@gegellibu.bsky.social

Additionally, we can make use of the word boundaries to remove the last 0xffff_0000 mask. Shifting left 1 discards any bits generated past the 16th bit. Bits 15-8 which retain some values get cleared when finally shifting back towards the LSB.

The shift can combined into any of the multiplications.

Improved bit packing where last masking operation was removed, by using the 32 bit word boundary. This reduces the number of operations to 4 ANDs, 4 MULTs and 1 shift right.

December 6, 2025 at 2:59 AM

Gegell

@gegellibu.bsky.social

The same thing can be done with the original code (however we collect the bits towards the MSB instead of LSB this time, flipping >> to <<). By doing this the number of operations can be reduced to only 10.

Replacing (x | (x >> v)) with a single multiplication to save a operation per line. Final code consists of 5 ANDs, 4 MULTs and 1 shift right, totaling 10 operations.

December 6, 2025 at 2:59 AM

Gegell

@gegellibu.bsky.social

For the above example if x, (x << a) and (x << b) all have bits in non-overlapping regions, we can safely replace the | with a +. Remembering that x << n = x * (2**n) we then write x + x * (2**a) + x * (2**b) = x * (1 + 2**a + 2**b) effectively factoring it to a single multiplication.

Long multiplication with bitwise values. Shows that a multiplication is just the addition of the same value shifted by different amounts. If set bits don't overlap, no carrying is necessary.

December 6, 2025 at 2:59 AM

Gegell

@gegellibu.bsky.social

Note that your `decode_morton2_65536x65536` should use
`_compact_1_by_1(uvec2(x, x >> 1))` instead of `uvec2(x, x<<1) `. Otherwise you're discarding the highest bit and multiplying one value unintentionally by 2.

December 5, 2025 at 8:40 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news