Longhorn
banner
never-released.mastodon.social.ap.brid.gy
Longhorn
@never-released.mastodon.social.ap.brid.gy
Kernel/hypervisor engineer, Amazon EC2.

[bridged from https://mastodon.social/@never_released on the fediverse by https://fed.brid.gy/ ]
hypocrisy²

it's (not so) funny how the right of self-determination is so variable as if some polities deserve it and not others.
December 27, 2025 at 8:00 PM
Memory Management: Changes in QNX8
Along with a new micro-kernel, QNX 8 ships with a new memory manager. This is the fourth incarnation of this component since the introduction of the QNX Neutrino operating system. To understand the changes, let’s examine the previous version of the memory manager. ### Memory Management in QNX 7 The memory manager shipped with QNX 7 is the result of a project started around 2012. The goal was to produce a memory manager that is optimized for on-demand paging. While some form of on-demand paging was supported in QNX 6.x, it was more of an add-on to an existing design, rather than a first-class citizen. Recall that on-demand paging means that physical memory is not (typically) allocated at `mmap()` time. Instead, the `mmap()` function just records what kind of memory needs to back a new range of virtual addresses. When a virtual address in this new range is first referenced (read from or written to), a page fault occurs, which causes the memory manager to allocate a physical page, initialize it with the right contents, and install a new page table entry to reflect the virtual-to-physical translation. On-demand paging provides a few desirable properties: 1. much cheaper `mmap()` calls, since `mmap()` does not need to allocate memory, nor manipulate page tables; 2. copy-on-write for `fork()`, which is especially beneficial if `fork()` is followed by `exec*()` and the cloned address space is discarded; 3. the ability to update virtual-to-physical translations at run time, which enables features such as page stealing and swapping; 4. lower memory footprint, in case a process does not reference all of the memory it has asked for with calls to `mmap()`. The memory manager shipped with QNX 7 provides these features, and yet we replaced it in QNX 8. Why? ### What’s Wrong with On-Demand Paging? Before we move on, I would like to emphasize that the criticism of on-demand paging expressed below should be taken in context. Specifically, the context is for a low-latency real-time operation system in safety applications. On-demand paging has proved successful in other environments for decades, and I am not advocating for its replacement there. #### Over-Committing Memory Let’s start from the last point: lower memory usage. This is true only when a process allocates memory via `mmap()`, but does not use all of it. If a process uses all the memory it has allocated, then on-demand paging does not have an advantage in terms of memory usage.1 These perceived savings only matter if the sum total of allocations exceed the amount of memory available in the system. In such a case, the system _over-commits_ memory: it has promised to processes that they can have more memory than can be had. As long as these processes never (at once) use all memory promised, then all is well, but if they happen to do so then they will start failing (manifested as a `SIGBUS` signal on reading or writing memory). Linux (and other systems) have no problem over-committing memory. QNX has made the decision not to do that: if a `mmap()` call succeeds, then, regardless of on-demand paging, the process can not fail to access memory it was promised. This is important to safety systems, which need to ensure that critical processes cannot fail arbitrarily due to changes to the state of the system. As long as these processes allocate their resources up-front, these resources are guaranteed to be available. How do you implement on-demand paging without over-committing? At `mmap()` time, even though physical memory is not allocated, the system does account for it, using a _reservation_. Reserving memory is a simple matter of subtracting the number of requested pages from the total available. A call to `mmap()` fails if it cannot reserve memory.2 The reservation scheme solves the `SIGBUS` problem, but it negates any benefits of on-demand paging with respect to memory usage. The fact that physical memory has not been allocated does not mean it is available to a `mmap()` call that needs more memory than the system can provide. #### Time to Map On-demand paging speeds up the `mmap()` call, but at the cost of later page faults that perform the bulk of the work. The overall time taken by `mmap()` plus handling these faults is actually higher than performing the same work up front, due to: 1. the cost of page fault handling (which is higher on more sophisticated hardware); 2. loss of optimizations available when allocating and setting up page tables for larger chunks of memory. From a performance point of view, on-demand paging benefits the same misbehaving programs that show lower memory usage, i.e., those that map considerably more memory than what they need. Things get worse when trying to avoid over-committing memory. Even though the `mmap()` call does not allocate physical memory nor install last-level page table entries, it does need to ensure that all meta-data and any higher-level page table entries are in place for the mapping. Otherwise, a page fault can fail to resolve. This explains why the map-without-access value for QNX 7 with on-demand paging is higher than Linux’s, as the latter does nothing other than carve the virtual address space at `mmap()` time. #### Latency Page faults introduce variability in execution time, which can be quite high. For time-sensitive, real-time applications, such variability is unacceptable. Consequently, real-time applications tend to ensure all memory is backed at `mmap()` time (using the POSIX `mlock()` or `mlockall()` calls), which disables on-demand paging. For safety system, QNX recommends that the entire system disables on-demand paging, a feature known as _superlocking_. This is the equivalent of calling `mlockall(MCL_CURRENT | MCL_FUTURE)` on each process when it is created. In such a system, all the data and time spent on supporting on-demand paging is pure overhead. #### Swapping One of the major features of the memory manager shipped with QNX 7 is support for swap devices. Swap allows the system to behave as though there is more RAM than what is available, by stashing the contents of non-file-backed pages on the swap device, allowing these pages to be stolen and reused. We have tried to use this feature on the BB10 phones circa 2013 and the results were not good. Page stealing is always costly, as it requires expensive page table manipulation. But when it also involves writing the contents of a page, and then reading back when restoring the page, page stealing performs much worse. Swap devices typically fall into one of two categories: storage-based, and RAM-based. The latter is feasible when using compression, as the amount of space in memory taken by page contents on the swap device is expected to be much lower than the size of the page. This is only true, however, if the page contents can be compressed efficiently. On modern systems, a considerable part of memory is taken by data that does not compress well (because it is already compressed, like MP3 files or JPEG images), or by data that cannot be moved into swap (graphics surfaces). On the other hand, the storage devices on embedded systems tend to have a limited number of write cycles, and are subject to excessive wear if used for swap. ### Enter QNX8 QNX8 features a brand new micro-kernel design. During the work on the new kernel I struggled with the question of how to incorporate support for on-demand paging. Handling recoverable page faults that occur in the context of user-mode execution is relatively straight-forward (and even easier with the new design than with the old one). However, handling such faults in the context of the kernel (i.e., faulting on user addresses inside kernel calls) is much harder. Since the primary focus of QNX these days is on safety systems, and since we recommend superlocking on these systems anyway, it occurred to me that we may not need on-demand paging at all. I wanted to see what gains can be had by dropping support for on-demand paging from the memory manager, simplifying its design, and allowing for greater optimization opportunities. Support for on-demand paging in the memory manager has resulted in a single-page oriented design: all code paths are meant to deal with one page at a time. While there are some opportunities for batching operations, they still end up doing quite a bit of work for each page individually, simply because of the way the code is structured. When applying superlocking, these operations are repeated for each page in every `mmap()` call: carve a virtual address range, allocate meta-data structures, allocate physical memory, initialize the memory, set up page-table entries. A single-page oriented design ends up doing these individually per page instead of each operation on the range. At this point, I wish I could have claimed that I invented some sophisticated algorithms for memory management that improve performance dramatically. The mundane reality, though, is that I simply did the following: 1. Remove any code and data structure required only for the support of on-demand paging, page stealing and swapping. 2. Consolidate and optimize loops to facilitate the batching of all operations involved in a `mmap()` call. The results of changing the memory manager to a range-oriented design were beyond what I had hoped for. The memory manager is much simpler (and thus easier to analyze for safety), and is much faster for any `mmap()` operation that involves more than one page. The graph below shows the time it takes to map a 1MB region using anonymous memory (i.e., where the source physical backing is anywhere in allocatable RAM and where the memory is zero-initialized). The tests were conducted on a SolidRun Honeycomb board, with 16 ARMv8 A72 cores at 2GHz. Not only is QNX8 faster than QNX7, it is also faster than Linux, if you assume that the memory is actually used, not just mapped.3 The change didn’t just have an effect on micro-benchmarks, though. My Dell desktop (Gen 12 Intel i7) boots to a browser displaying a page in 700ms vs close to 3 seconds with on-demand paging. On a SolidRun Honeycomb board, parallel compilation (using `make -j`) now takes advantage of up to 12 cores, where before it peaked at 4. This change to the memory manager does not remove any functionality, other than support for swap devices.4 The system still supports anonymous, physical and file-backed memory objects, shared and private mappings, the handling of `fork()`, `exec*()` and `posix_spawn()`, etc. It just does it with all physical memory backed at `mmap()` time, and with this backing remaining constant until it is unmapped. So what are the downsides? Any program that is written to ask for significantly more memory than it needs suffers from superlocking. While QNX 8 is much faster in the superlocking case than QNX 7, it is slower compared with on-demand paging when a program asks for 1GB of memory and touches 1MB. This is actually quite common in the Linux world (I’ve seen VS Code reporting memory usage of over 1TB, which clearly it doesn’t use). The answer to the problem, especially in the context of safety systems, is “don’t do that”. Your program either needs this much memory, or it doesn’t. And yes, the analysis may be hard, but it needs to be done. A special case of this problem is stacks. The default thread stack size on QNX is 256KB. Traditionally, stacks are allocated lazily, and C/C++ programmers are not used to analyzing and specifying the required stack sizes (which sometimes are not even known, as in the case of recursion controlled by some program state). Without on-demand paging the stacks are fully backed at thread-creation time, with a one-size-fits-all allocation of 64 pages. With the proliferation of threads, this has an impact both on memory usage and thread creation time. For safety systems we can still make the claim that you should analyze your code to determine the maximum stack size for each thread, and then create the thread with the appropriate stack size. For non-safety systems this is a much harder argument to make, especially in terms of cost/benefit. Copy-on-write is no longer available for `fork()`, which means that `fork()` followed by `exec*()` does too much: all private mappings are copied at `fork()` time, and then lost as a new address space is created. This can be solved by replacing `fork()`+`exec()` with `posix_spawn()`. Will on-demand paging make a comeback in QNX8? Maybe, though probably in a limited way (perhaps only for stacks). For now, however, the benefits of the new memory manager appear to outweigh its shortcomings considerably, especially when you look at complete, real-world systems. 1. There are still ways in which on-demand paging can save on memory, such as with copy-on-write pages that are only read but not written. ↩︎ 2. Reservation is actually more complicated, as shared mappings only need to be reserved once, file-backed read-only mappings do not require reservation as they can be replaced, swap space needs to be taken into account, etc. ↩︎ 3. Results are for Linux 5.10. I have also tested the latest kernel at the time the post was written, which is 6.12. It shows somewhat worse results than 5.10, but not enough to change the overall picture. ↩︎ 4. One interesting side-effect of superlocking is that writes to pages are not detected, which necessitates treating all shared-writable pages as dirty. While not technically incorrect, this limitation restricts the usefulness of such mappings. ↩︎ ### Share this: * Click to share on X (Opens in new window) X * Click to share on Facebook (Opens in new window) Facebook * Click to share on LinkedIn (Opens in new window) LinkedIn * Like Loading... ### _Related_
membarrier.wordpress.com
December 27, 2025 at 12:42 PM
"how does QNX handle mmap today with its microkernel arch" and well

> The QNX OS supports POSIX memory locking APls, but only for compatibility reasons. These calls will silently succeed, but in this release all memory is "superlocked",

that's a way to deal with it I guess

QNX 7 is very different
December 26, 2025 at 8:46 PM
huh Xqci RISC-V extensions by Qualcomm for microcontrollers: https://github.com/quic/riscv-unified-db/releases/tag/Xqci-0.13.0
December 24, 2025 at 6:05 PM
Not a Processing Unit
December 23, 2025 at 5:43 PM
isn't it funny that a Core Ultra 9 Processor 275HX + a 5070 Ti Laptop doesn't meet the specs to be a Copilot+ PC? lmao
December 23, 2025 at 1:12 PM
This blog post shocked me: https://elisa.tech/ambassadors/2025/12/10/schrodingers-test-the-dev-mem-case/

> safety
> /dev/mem
> adding an arbitrary constraint of not being able to set aside some memory

come on. just what. just taking a peek at the "enabling Linux for safety applications" […]
Original post on mastodon.social
mastodon.social
December 23, 2025 at 11:39 AM
sometimes the customer isn't always right tbh :/

imo very poor excuses here overall
December 23, 2025 at 11:34 AM
Some really oddball Linux ideas:

> Achieving ASIL B Qualified Linux
while minimizing expectations
from upstream kernel

from https://lpc.events/event/19/contributions/2132/attachments/1905/4076/LPC2025%20-%20Igor%20Stoppa.pdf […]

[Original post on mastodon.social]
December 23, 2025 at 11:20 AM
looks like the CUDA dynamic parallelism rework (CDP2, mandatory on Hopper onwards) is changed just enough to not actually _need_ dynamic kernel dispatch prior to exit.

but can be implemented on top of indirect command buffers unlike the old one
December 22, 2025 at 8:09 PM
arstechnica.com
December 22, 2025 at 4:53 PM
ah the reason why Epic Games has to give out free games is that because their launcher is an atrocious product
December 22, 2025 at 9:13 AM
Replacing Helvetica with Arial across the board
December 21, 2025 at 7:10 PM
wow the Epic Games launcher is atrocious and powers on the dGPU (???)

like wtf
December 21, 2025 at 6:40 PM
ok Arm but nobody outside of Arm will ever do this

> An Armv8-A implementation can be called an AArchv8-A implementation and an Armv9-A implementation can be called an AArchv9-A implementation.
December 21, 2025 at 10:43 AM
so this 5070 Ti laptop only takes 100W over USB-C PD

and runs in a quite throttled mode over that.

To run properly it needs that giant 330W brick
December 21, 2025 at 6:46 AM
damn 5070 Ti laptop GPU throttles so much on a random gaming laptop when on battery

perf absolutely a _lot_ worse than when plugged (using ootb settings)
December 20, 2025 at 6:09 PM
GB10 has ASTC...

Quite a bit odd that all the Tegras from NVIDIA have ASTC but the dGPUs don't

https://vulkan.gpuinfo.org/displayreport.php?id=44645#device
December 19, 2025 at 12:08 PM
The NVIDIA Jetson Christmas ad this year is a bit of an oddball :D

https://www.youtube.com/watch?v=RJ8Yhy1OdYI
December 16, 2025 at 5:53 AM
RISC OS ARMv7 compatibility primer

Details the incompatible changes across AArch32 over the years and how it's possible despite those to have the same binary running across all of them

https://www.riscosopen.org/wiki/documentation/show/ARMv7%20compatibility%20primer
RISC OS Open: ARMv7 compatibility primer in Library
www.riscosopen.org
December 15, 2025 at 11:02 PM
spirv2clc: https://github.com/kpet/spirv2clc

An OpenCL SPIR-V to OpenCL C translator for recalcitrant OpenCL drivers to still be able to run SPIR-V binaries.
GitHub - kpet/spirv2clc: Experimental OpenCL SPIR-V to OpenCL C translator
Experimental OpenCL SPIR-V to OpenCL C translator. Contribute to kpet/spirv2clc development by creating an account on GitHub.
github.com
December 15, 2025 at 2:08 PM
GPUs are fun
December 15, 2025 at 5:20 AM
clpeak, with clvk on KosmicKrisp
December 15, 2025 at 12:27 AM
clinical, synthetic hugs
provided by AI bot
December 14, 2025 at 7:13 PM