Peter Goodman
cxx.dev
Peter Goodman
@cxx.dev
C++ developer specializing in source and binary program analysis and transformation.
I'd love yo heqr about your solutions 😀
February 13, 2025 at 7:34 PM
How do you see the balance of value brought to the table between new models just being better vs. whatever smarts are encoded in the harnessing of those models?
December 7, 2024 at 6:31 AM
📌
December 7, 2024 at 6:24 AM
📌
December 6, 2024 at 6:50 PM
16/16 GRR and microx were my first major contributions at Trail of Bits, and represented a continuity of my DBT research from my M.Sc from the University of Toronto. They were super fun projects to create and work on, and I'm extremely proud of both of them.
December 4, 2024 at 5:31 PM
15/16 What was humbling was that UDB itself was a record/replay x86-64 dynamic binary translator (DBT). So while I was toiling away trying to get my DBT working for DECREE, I was relying on their much more general system to debug mine!
December 4, 2024 at 5:30 PM
14/16 As you can imagine, debugging a dynamic binary translator can be tricky; when things go wrong, your debugger isn't as helpful because there's no debug information for just-in-time translated code. UndoDB's time-travelling debugger, UDB, was a productivity multiplier (undo.io/products/udb).
UDB
UDB is the time travel debugger for C/C++ applications running on Linux. Replay execution history to inspect program state and see what happened. Quickly debug race conditions, seg faults, stackoverfl...
undo.io
December 4, 2024 at 5:29 PM
13/16 At the time, the Unicorn engine didn't provide fine-grained information about instruction dependencies, and it was very crashy. Our attempts to use it had us concretizing any symbolic bytes in big swaths of the stack, artificially limiting the futures that the symbolic executor could explore.
December 4, 2024 at 5:29 PM
12/16 Fun segue: our pysymemu fork used microx (github.com/lifting-bits...), my fourth binary translator, to *natively* execute instructions that didn't have symbolic python models. Microx allowed us to minimize how much symbolic state had to be concretized when executing instructions natively.
GitHub - lifting-bits/microx: Safely execute an arbitrary x86 instruction
Safely execute an arbitrary x86 instruction. Contribute to lifting-bits/microx development by creating an account on GitHub.
github.com
December 4, 2024 at 5:28 PM
11/16 GRR's snapshots could also be shared with a custom CGC-specific fork of pysymemu (github.com/feliam/pysym...). Fun fact: pysymemu evolved into the Manticore symbolic executor (github.com/trailofbits/...). This sharing allowed the fuzzer and symbolic execution components to "blindly" cooperate.
GitHub - feliam/pysymemu: An amd64 symbolic emulator
An amd64 symbolic emulator. Contribute to feliam/pysymemu development by creating an account on GitHub.
github.com
December 4, 2024 at 5:28 PM
10/16 Another cool thing was that GRR was deterministic and could produce and resume from program snapshots. The original motivation of this feature was to skip to the first read(2) system call, avoiding deterministic program setup costs.
December 4, 2024 at 5:28 PM
9/16 GRR was a fairly effective fuzzer, but the fuzzer logic wasn't nearly as smart as its contemporaries such as AFL. Where the GRR fuzzer was good was that it could operate on the whole input or individual system calls, doing things like repeating or swapping inputs at a finer granularity.
December 4, 2024 at 5:28 PM
8/16 Faithfully emulating DECREE meant doing a lot of weird testing. One fun discovery was that write(2) will avoid returning an EFAULT as long as a minimum number of bytes have been read (github.com/lifting-bits...).
github.com
December 4, 2024 at 5:27 PM
7/16 DECREE, as released by DARPA, was implemented a Linux kernel fork that loaded CGC binaries (really: slightly tweaked ELFs) that used a custom system call personality table that restricted loaded binaries to just their few system calls.
December 4, 2024 at 5:27 PM
6/16 To get Radamsa to work as a function meant compiling the Scheme to C using the OWL Lisp compiler, then patching that horrible output so that I could track its memory allocations and network calls, and invoke its main function as though it were any other normal function in a program.
December 4, 2024 at 5:26 PM
5/16 Another fun thing about GRR was that Radamsa (gitlab.com/akihe/radamsa) was embedded as a callable function. If you're familiar with Radamsa then this may come as a surprise -- Radamsa is written in a Scheme dialect, and normally sends inputs over the network.
Aki Helin / radamsa · GitLab
a general-purpose fuzzer
gitlab.com
December 4, 2024 at 5:26 PM
4/16 One cool thing is that GRR could handle self-modifying DECREE binaries, which made opening the code cache in IDA Pro or Binary Ninja fun, because you could browse the evolution of those code modifications.
December 4, 2024 at 5:26 PM
3/16 GRR translated x86 into x86-64, so that one or more DECREE binaries in 4 GiB (32 bit) memory spaces within its own much larget 64 bit address space. Translated code could be instrumented for code coverage, and cached to disk to amortize translation costs across GRR runs.
December 4, 2024 at 5:26 PM
2/16 DECREE programs are basically simplified 32-bit, x86 Linux programs -- they can use only six or so system calls. GRR's used dynamic binary translation, a just-in-time translation technique that rewrote the target program machine code while it was running!
December 4, 2024 at 5:25 PM
5/15 Another fun thing about GRR was that Radamsa (gitlab.com/akihe/radamsa) was embedded as a callable function. If you're familiar with Radamsa then this may come as a surprise -- Radamsa is written in a Scheme dialect, and normally sends inputs over the network.
Aki Helin / radamsa · GitLab
a general-purpose fuzzer
gitlab.com
December 4, 2024 at 5:23 PM
15/15 Thanks (and sorry!) to the team of people who helped/suffered along the way! Also thanks to DARPA for funding this research through Sergey Bratus' Assured Micro-Patching (AMP) program.
December 3, 2024 at 7:30 PM
14/15 In summary, Dr. Lojekyll was one of the most fun projects I created at Trail of Bits. It was also the most brutal to debug. I learned that debugging declarative languages is hard, and debugging in-progress/broken compilers for declarative languages is harder.
December 3, 2024 at 7:30 PM
13/15 I always saw automated database factorization and nesting as the ultimate solution to the intermediate relation explosion problem, but we never had the time to address it, and Dr. Lojekyll's codebase was not flexible enough to make experimental extensions easy.
December 3, 2024 at 7:29 PM