Lightnews — Scholar-powered news

Christoph Lutz

@christophlutz.bsky.social

10/11
This behavior can be observed (and changed) with gdb - highly experimental (t.ly/lqdJ0)!

Examples:

November 5, 2025 at 6:50 AM

Christoph Lutz

@christophlutz.bsky.social

10/11
This behavior can be observed (and changed) with gdb - highly experimental (t.ly/lqdJ0)!

Examples:

November 5, 2025 at 6:46 AM

Christoph Lutz

@christophlutz.bsky.social

... problem is that these numbers are not compatible with the UUID specification, in which byte positions 6 and 8 are partially reserved for the version and variant information. Therefore, Oracle fixes these bytes in ztuguid. This can be observed with bpftrace (t.ly/ve_c1):

November 1, 2025 at 7:51 PM

Christoph Lutz

@christophlutz.bsky.social

Week end fun: snooping inter-process messaging (ksbasend) in Oracle with bpftrace. 🤓

t.ly/_yDy7

October 31, 2025 at 8:06 PM

Christoph Lutz

@christophlutz.bsky.social

When you plan to geek out over some oracle internals, but end up ftrace’ing bpf the entire week end to chase a funny bug that only occurs on exadata with capacity on demand ...

October 5, 2025 at 5:30 PM

Christoph Lutz

@christophlutz.bsky.social

Yet another adaptive lgwr optimization: on Exadata X10+, pipelined log writes may defer redo writes until a suitably sized write batch has accumulated in the log buffer.

The deferral can involve spinning in a tight loop up to 25 times (maximum hard-coded in kcrfw_defer_write).

September 21, 2025 at 12:09 PM

Christoph Lutz

@christophlutz.bsky.social

Nested loops, baby 😜

September 18, 2025 at 8:40 AM

Christoph Lutz

@christophlutz.bsky.social

September 13, 2025 at 11:28 AM

Christoph Lutz

@christophlutz.bsky.social

So glad that all new features are documented so well... NOT 😜

Manually enabling and disabling adaptive lgwr evaluation trace for pipelined / overlapped redo writes:

September 4, 2025 at 4:35 PM

Christoph Lutz

@christophlutz.bsky.social

"Slide n of 142".. this is getting out of control... 🙈

August 12, 2025 at 7:57 PM

Christoph Lutz

@christophlutz.bsky.social

6/6
As always, bpftrace is very useful for observing and studying undocumented behavior:

Trace write info array updates (LGWR/LGnn): t.ly/VV--a
Trace write info array scans (FG): t.ly/67R-J

August 4, 2025 at 8:03 AM

Christoph Lutz

@christophlutz.bsky.social

1/6
The "redo synch time overhead" in Oracle is the difference between a FG's log file sync (LFS) wait end time and LGWR's redo write completion time.

LGWR and LG workers track the redo write completion times in the "write info array" in the SGA.

August 4, 2025 at 8:01 AM

Christoph Lutz

@christophlutz.bsky.social

lgwr: You have the control

gdb: I have the control

August 3, 2025 at 4:45 PM

Christoph Lutz

@christophlutz.bsky.social

Works in my case, but needs a little help from gdb. 😉

The check occurs at query compile time (kkb), kkbkauxbll checks a session flag (at offset saddr + 0x1012) and raises an ORA-46385 if not set to 0x10 (on 19.26).

It’s not exposed in x$ksuse, but I haven’t yet checked where or how it’s set.

July 31, 2025 at 11:59 AM

Christoph Lutz

@christophlutz.bsky.social

3/4
This example illustrates how many workers lgwr assigns to handle a parallel write for each combination of "active redo strands" and "log write parallelism", assuming a maximum of 16 public redo strands (this is purely computational, not all combinations have been tested):

July 29, 2025 at 12:18 PM

Christoph Lutz

@christophlutz.bsky.social

The trace below shows lg00 submitting a log write at T4 (lwn scn 0x70f14b55773) while lg04's earlier write submitted at T2 (lwn 0x70f14b55768) is still in flight, suggesting Pipelined Log Writes. But the trace is from an X8, not X10, where the feature is documented ...🤔 (I/Os throttled for testing)

July 27, 2025 at 2:36 PM

Christoph Lutz

@christophlutz.bsky.social

7/7
Fast Sync wait behavior can be observed with the following bpftrace scipt: t.ly/9K1Mc

All of the above was tested on 19.26 (with RAC on Exadata).

July 24, 2025 at 8:53 AM

Christoph Lutz

@christophlutz.bsky.social

6/7
Computation of the adaptive sleep duration is quite complex and based on additional runtime counters maintained by fg and bg processes in the sga struct kcrf_alfs_info_ (these are not exposed). Anyway, that's a rabbit hole for another day ... 🕵️‍♂️

July 24, 2025 at 8:52 AM

Christoph Lutz

@christophlutz.bsky.social

1/7
On Exadata with pmemlog, the Fast Log File Sync dynamically tunes the log file sync sleep duration to balance responsiveness vs cpu ovherhead (spinning after wakeup).

Oracle tracks three wait variants via different session stat counters:

1. Sleep
2. Spin
3. Backoff Sleeps

July 24, 2025 at 8:51 AM

Christoph Lutz

@christophlutz.bsky.social

If you've ever felt the need to manually control adaptive lgwr features, this gdb script's got you covered: t.ly/lJIW1 😎

It lets you enable and disable adaptive scalable lgwr, fast log file sync, and log parallelism. Highly experimental, of course!

Example 👇

July 23, 2025 at 10:41 AM

Christoph Lutz

@christophlutz.bsky.social

Geeky Sunday mission accomplished… 💥✌️😎

The Fast Sync sleep duration used during log file sync waits is an adaptive moving average calculated every 3 sec by ckpt.

For the curious, more details in this python script: tinyurl.com/mrycy7zh

More explanations will follow some other time.

July 13, 2025 at 3:46 PM

Christoph Lutz

@christophlutz.bsky.social

Week end plan: figure out how Fast Sync calculates the adaptive sleep duration used during log file sync waits. Answer is in the numbers below ... 🤔

July 11, 2025 at 6:54 AM

Christoph Lutz

@christophlutz.bsky.social

Haven't used good old perf for a while and forgot how cool it actually is. Wondering what process and code paths on your system modify a memory address?

perf's got you covered:

perf record -a -g -e mem:<addr>:w

July 10, 2025 at 9:10 AM

Christoph Lutz

@christophlutz.bsky.social

7/10
When multiple strands are active, Oracle tends to deactivate them fairly aggressively: lgwr disables one strand on every 10th log write (hard-coded in kcrfw_redo_write_driver).

Example:

June 16, 2025 at 6:06 AM

Christoph Lutz

@christophlutz.bsky.social

5/10
With multiple active strands, the strand in step 1 is chosen randomly:
strand = rand_nr % active_strands.

If the RAL get fails (in 1. or 2.), rand_nr is incremented to retry with the next strand (wrapping to strand 0 if needed).

Examples:

June 16, 2025 at 6:06 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news