Sebastian Aaltonen
sebaaltonen.bsky.social
Sebastian Aaltonen
@sebaaltonen.bsky.social
Building a new renderer at HypeHype. Former principal engineer at Unity and Ubisoft. Opinions are my own.
"Parkour City" was one of the first bigger games in HypeHype. It looked pretty bad two years ago.

With our new renderer, this game today looks pretty good. I did a minor remix: Changed the visual style preset and tweaked the sun and fog settings.
December 19, 2024 at 1:06 PM
3->4 shadow cascades performance (M3 Max):

3 cascades:
Vertex = 0.20ms
Pixel = 0.07ms

4 cascades:
Vertex = 0.39ms (195%)
Pixel = 0.11ms (157%)

On M3 Max, a total cost of 0.5ms for good-looking shadows is a no-brainer. However, a 185% cost is too much for low- and mid-tier phones.
December 6, 2024 at 11:04 AM
Today, I added storage buffer support to HypeHype RHI and all the backends: WebGPU, Metal, and Vulkan.

I tested changing some shaders to SSBO instead of UBO. Works on all platforms!

Metal was the easiest. Buffers are just GPU pointers in Metal. There's no UBO or SSBO. Zero changes...
December 5, 2024 at 3:39 PM
Crow of the first image (medium distance). Four shadow cascades (right) looks much nicer.
December 5, 2024 at 10:08 AM
Close-up (cascade 1):

The shadow map atlas is 4096x4096 pixels on PC. Each cascade is a 2048x2048 region. We render them in a single pass, changing the viewport rect between cascades.
December 5, 2024 at 10:04 AM
Added fourth shadow cascade for high-end devices.

This is how it looks at medium distance (cascade 2):

Left = 3 cascades
Right = 4 cascades

(click to enlarge, left/right arrows to toggle)
December 5, 2024 at 10:02 AM
You can barely do outdoors with such limited polygon budget. Dense forests are out of the question.

Half Life 2 compared to Alan Wake 2 below:
December 2, 2024 at 2:20 PM
This is one of the reasons Adreno is so much behind Intel and AMD iGPUs on GPU-driven workloads such as Rainbow Six Siege and UE5 Nanite.

Qualcomm has to do the same that AMD and Nvidia did years ago. Not route all mem loads though their texture units. Fast raw load path is crucial.
December 2, 2024 at 2:14 PM
This is how vertex work overlaps with pixel work currently. We setup barriers in Vulkan and Metal to only block pixel work. This is approx 20% perf gain on G57 when heavy vertex work overlaps previous frame. On Apple (left) 10%, but more if heavily GPU bound.
November 29, 2024 at 11:38 AM
Now that we shipped WebGPU and Android will be 100% Vulkan soon, we started designing compute shader pipeline. GPU-driven rendering, virtual shadow maps, compute stochastic lighting, etc.

But this is not without issues on mobile. I worry about vertex work overlap...
November 29, 2024 at 11:30 AM
Just pushed my WebGPU gfx backend to production.

Took 21 days to develop in total: native Dawn Mac + Windows first, then Emscripten for web. Runs really smoothly on the browser and the page load is super fast.

Zero issues on Mac, Windows or Chrome. Safari Tech Preview has some flickering tris.
November 28, 2024 at 2:53 PM
Another one :)
November 28, 2024 at 2:04 PM
User-generated content (UGC) in a nutshell: Massive spam of physics objects and skinned characters.

Runs at locked 60Hz well on modern phones. The problem is cheap 99€ Android phones with PowerVR GE8320 :(
November 28, 2024 at 1:48 PM
This is how fast the GTAO runs on Mali G57 MP2. This phone GPU is roughly 5x slower than iPhone 6s (9 year old flagship).
November 28, 2024 at 8:35 AM
Yes. It works fine on new 57$ and 3-year-old 99$ test phones.

GTAO timing here at quarter resolution on Mali G57:
November 28, 2024 at 8:32 AM
Have been tuning HypeHype's custom GTAO implementation for two days: Morton sampling pattern, bilateral blur uses slope-based weight (instead of constant Z), 2x2 depth downsample passes use median (instead of max). Still work to do, but starts to look pretty good.

Game: "Block City" by Blue Mint
November 27, 2024 at 5:22 PM
Testing video upload. Seems to have 50MB limit...
November 27, 2024 at 4:58 PM