Lightnews — Scholar-powered news

Ken Shirriff's blog

@righto.com.web.brid.gy

Computer history, restoring vintage computers, IC reverse engineering, and whatever

[bridged from https://righto.com/ on the web: https://fed.brid.gy/web/righto.com ]

Posts Replies Media Videos

Ken Shirriff's blog

@righto.com.web.brid.gy

The stack circuitry of the Intel 8087 floating point chip, reverse-engineered

Early microprocessors were very slow when operating with floating-point numbers. But in 1980, Intel introduced the 8087 floating-point coprocessor, performing floating-point operations up to 100 times faster. This was a huge benefit for IBM PC applications such as AutoCAD, spreadsheets, and flight simulators. The 8087 was so effective that today's computers still use a floating-point system based on the 8087.1 The 8087 was an extremely complex chip for its time, containing somewhere between 40,000 and 75,000 transistors, depending on the source.2 To explore how the 8087 works, I opened up a chip and took numerous photos of the silicon die with a microscope. Around the edges of the die, you can see the hair-thin bond wires that connect the chip to its 40 external pins. The complex patterns on the die are formed by its metal wiring, as well as the polysilicon and silicon underneath. The bottom half of the chip is the "datapath", the circuitry that performs calculations on 80-bit floating point values. At the left of the datapath, a constant ROM holds important constants such as π. At the right are the eight registers that form the stack, along with the stack control circuitry. Die of the Intel 8087 floating point unit chip, with main functional blocks labeled. The die is 5mm×6mm. Click for a larger image. The chip's instructions are defined by the large microcode ROM in the middle. This ROM is very unusual; it is semi-analog, storing two bits per transistor by using four transistor sizes. To execute a floating-point instruction, the 8087 decodes the instruction and the microcode engine starts executing the appropriate micro-instructions from the microcode ROM. The decode circuitry to the right of the ROM generates the appropriate control signals from each micro-instruction. The bus registers and control circuitry handle interactions with the main 8086 processor and the rest of the system. Finally, the bias generator uses a charge pump to create a negative voltage to bias the chip's substrate, the underlying silicon. The stack registers and control circuitry (in red above) are the subject of this blog post. Unlike most processors, the 8087 organizes its registers in a stack, with instructions operating on the top of the stack. For instance, the square root instruction replaces the value on the top of the stack with its square root. You can also access a register relative to the top of the stack, for instance, adding the top value to the value two positions down from the top. The stack-based architecture was intended to improve the instruction set, simplify compiler design, and make function calls more efficient, although it didn't work as well as hoped. The stack on the 8087. From _The 8087 Primer_ , page 60. The diagram above shows how the stack operates. The stack consists of eight registers, with the Stack Top (ST) indicating the current top of the stack. To push a floating-point value onto the stack, the Stack Top is decremented and then the value is stored in the new top register. A pop is performed by copying the value from the stack top and then incrementing the Stack Top. In comparison, most processors specify registers directly, so register 2 is always the same register. ## The registers The stack registers occupy a substantial area on the die of the 8087 because floating-point numbers take many bits. A floating-point number consists of a fractional part (sometimes called the mantissa or significand), along with the exponent part; the exponent allows floating-point numbers to cover a range from extremely small to extremely large. In the 8087, floating-point numbers are 80 bits: 64 bits of significand, 15 bits of exponent, and a sign bit. An 80-bit register was very large in the era of 8-bit or 16-bit computers; the eight registers in the 8087 would be equivalent to 40 registers in the 8086 processor. The registers in the 8087 form an 8×80 grid of cells. The close-up shows an 8×8 block. I removed the metal layer with acid to reveal the underlying silicon circuitry. The registers store each bit in a static RAM cell. Each cell has two inverters connected in a loop. This circuit forms a stable feedback loop, with one inverter on and one inverter off. Depending on which inverter is on, the circuit stores a 0 or a 1. To write a new value into the circuit, one of the lines is pulled low, flipping the loop into the desired state. The trick is that each inverter uses a very weak transistor to pull the output high, so its output is easily overpowered to change the state. Two inverters in a loop can store a 0 or a 1. These inverter pairs are arranged in an 8 × 80 grid that implements eight words of 80 bits. Each of the 80 rows has two _bitlines_ that provide access to a bit. The bitlines provide both read and write access to a bit; the pair of bitlines allows either inverter to be pulled low to store the desired bit value. Eight vertical _wordlines_ enable access to one word, one column of 80 bits. Each wordline turns on 160 pass transistors, connecting the bitlines to the inverters in the selected column. Thus, when a wordline is enabled, the bitlines can be used to read or write that word. Although the chip looks two-dimensional, it actually consists of multiple layers. The bottom layer is silicon. The pinkish regions below are where the silicon has been "doped" to change its electrical properties, making it an active part of the circuit. The doped silicon forms a grid of horizontal and vertical wiring, with larger doped regions in the middle. On top of the silicon, polysilicon wiring provides two functions. First, it provides a layer of wiring to connect the circuit. But more importantly, when polysilicon crosses doped silicon, it forms a transistor. The polysilicon provides the gate, turning the transistor on and off. In this photo, the polysilicon is barely visible, so I've highlighted part of it in red. Finally, horizontal metal wires provide a third layer of interconnecting wiring. Normally, the metal hides the underlying circuitry, so I removed the metal with acid for this photo. I've drawn blue lines to represent the metal layer. Contacts provide connections between the various layers. A close-up of a storage cell in the registers. The metal layer and most of the polysilicon have been removed to show the underlying silicon. The layers combine to form the inverters and selection transistors of a memory cell, indicated with the dotted line below. There are six transistors (yellow), where polysilicon crosses doped silicon. Each inverter has a transistor that pulls the output low and a weak transistor to pull the output high. When the word line (vertical polysilicon) is active, it connects the selected inverters to the bit lines (horizontal metal) through the two selection transistors. This allows the bit to be read or written. The function of the circuitry in a storage cell. Each register has two tag bits associated with it, an unusual form of metadata to indicate if the register is empty, contains zero, contains a valid value, or contains a special value such as infinity. The tag bits are used to optimize performance internally and are mostly irrelevant to the programmer. As well as being accessed with a register, the tag bits can be accessed in parallel as a 16-bit "Tag Word". This allows the tags to be saved or loaded as part of the 8087's state, for instance, during interrupt handling. ## The decoder The decoder circuit, wedged into the middle of the register file, selects one of the registers. A register is specified internally with a 3-bit value. The decoder circuit energizes one of the eight register select lines based on this value. The decoder circuitry is straightforward: it has eight 3-input NOR gates to match one of the eight bit patterns. The select line is then powered through a high-current driver that uses large transistors. (In the photo below, you can compare the large serpentine driver transistors to the small transistors in a bit cell.) The decoder circuitry has eight similar blocks to drive the eight select lines. The decoder has an interesting electrical optimization. As shown earlier, the register select lines are eight polysilicon lines running vertically, the length of the register file. Unfortunately, polysilicon has fairly high resistance, better than silicon but much worse than metal. The problem is that the resistance of a long polysilicon line will slow down the system. That is, the capacitance of transistor gates in combination with high resistance causes an RC (resistive-capacitive) delay in the signal. The solution is that the register select lines also run in the metal layer, a second set of lines immediately to the right of the register file. These lines branch off from the register file about 1/3 of the way down, run to the bottom, and then connect back to the polysilicon select lines at the bottom. This reduces the maximum resistance through a select line, increasing the speed. A diagram showing how 8 metal lines run parallel to the main select lines. The register file is much taller than shown; the middle has been removed to make the diagram fit. ## The stack control circuitry A stack needs more control circuitry than a regular register file, since the circuitry must keep track of the position of the top of the stack.3 The control circuitry increments and decrements the top of stack (TOS) pointer as values are pushed or popped (purple).4 Moreover, an 8087 instruction can access a register based on its offset, for instance the third register from the top. To support this, the control circuitry can temporarily add an offset to the top of stack position (green). A multiplexer (red) selects either the top of stack or the adder output, and feeds it to the decoder (blue), which selects one of the eight stack registers in the register file (yellow), as described earlier. The register stack in the 8087. Adapted from Patent USRE33629E. I don't know what the GRX field is. I also don't know why this shows a subtractor and not an adder. The physical implementation of the stack circuitry is shown below. The logic at the top selects the stack operation based on the 16-bit micro-instruction.5 Below that are the three latches that hold the top of stack value. (The large white squares look important, but they are simply "jumpers" from the ground line to the circuitry, passing under metal wires.) The stack control circuitry. The blue regions on the right are oxide residue that remained when I dissolved the metal rail for the 5V power. The three-bit adder is at the bottom, along with the multiplexer. You might expect the adder to use a simple "full adder" circuit. Instead, it is a faster carry-lookahead adder. I won't go into details here, but the summary is that at each bit position, an AND gate produces a Carry Generate signal while an XOR gate produces a Carry Propagate signal. Logic gates combine these signals to produce the output bits in parallel, avoiding the slowdown of the carry rippling through the bits. The incrementer/decrementer uses a completely different approach. Each of the three bits uses a toggle flip-flop. A few logic gates determine if each bit should be toggled or should keep its previous value. For instance, when incrementing, the top bit is toggled if the lower bits are 11 (e.g. incrementing from 011 to 100). For decrementing, the top bit is toggled if the lower bits are 00 (e.g. 100 to 011). Simpler logic determines if the middle bit should be toggled. The bottom bit is easier, toggling every time whether incrementing or decrementing. The schematic below shows the circuitry for one bit of the stack. Each bit is implemented with a moderately complicated flip-flop that can be cleared, loaded with a value, or toggled, based on control signals from the microcode. The flip-flop is constructed from two set-reset (SR) latches. Note that the flip-flop outputs are crossed when fed back to the input, providing the inversion for the toggle action. At the right, the multiplexer selects either the register value or the sum from the adder (not shown), generating the signals to the decoder. Schematic of one bit of the stack. ## Drawbacks of the stack approach According to the designers of the 8087,7 the main motivation for using a stack rather than a flat register set was that instructions didn't have enough bits to address multiple register operands. In addition, a stack has "advantages over general registers for expression parsing and nested function calls." That is, a stack works well for a mathematical expression since sub-expressions can be evaluated on the top of the stack. And for function calls, you avoid the cost of saving registers to memory, since the subroutine can use the stack without disturbing the values underneath. At least that was the idea. The main problem is "stack overflow". The 8087's stack has eight entries, so if you push a ninth value onto the stack, the stack will overflow. Specifically, the top-of-stack pointer will wrap around, obliterating the bottom value on the stack. The 8087 is designed to detect a stack overflow using the register tags: pushing a value to a non-empty register triggers an invalid operation exception.6 The designers expected that stack overflow would be rare and could be handled by the operating system (or library code). After detecting a stack overflow, the software should dump the existing stack to memory to provide the illusion of an infinite stack. Unfortunately, bad design decisions made it difficult "both technically and commercially" to handle stack overflow. One of the 8087's designers (Kahan) attributes the 8087's stack problems to the time difference between California, where the designers lived, and Israel, where the 8087 was implemented. Due to a lack of communication, each team thought the other was implementing the overflow software. It wasn't until the 8087 was in production that they realized that "it might not be possible to handle 8087 stack underflow/overflow in a reasonable way. It's not impossible, just impossible to do it in a reasonable way." As a result, the stack was largely a problem rather than a solution. Most 8087 software saved the full stack to memory before performing a function call, creating more memory traffic. Moreover, compilers turned out to work better with regular registers than a stack, so compiler writers awkwardly used the stack to emulate regular registers. The `GCC` compiler reportedly needs 3000 lines of extra code to support the x87 stack. In the 1990s, Intel introduced a new floating-point system called SSE, followed by AVX in 2011. These systems use regular (non-stack) registers and provide parallel operations for higher performance, making the 8087's stack instructions largely obsolete. ## The success of the 8087 At the start, Intel was unenthusiastic about producing the 8087, viewing it as unlikely to be a success. John Palmar, a principal architect of the chip, had little success convincing skeptical Intel management that the market for the 8087 was enormous. Eventually, he said, "I'll tell you what. I'll relinquish my salary, provided you'll write down your number of how many you expect to sell, then give me a dollar for every one you sell beyond that."7 Intel didn't agree to the deal—which would have made a fortune for Palmer—but they reluctantly agreed to produce the chip. Intel's Santa Clara engineers shunned the 8087, considering it unlikely to work: the 8087 would be two to three times more complex than the 8086, with a die so large that a wafer might not have a single working die. Instead, Rafi Nave, at Intel's Israel site, took on the risky project: “Listen, everybody knows it's not going to work, so if it won't work, I would just fulfill their expectations or their assessment. If, by chance, it works, okay, then we'll gain tremendous respect and tremendous breakthrough on our abilities.” A small team of seven engineers developed the 8087 in Israel. They designed the chip on Mylar sheets: a millimeter on Mylar represented a micron on the physical chip. The drawings were then digitized on a Calma system by clicking on each polygon to create the layout. When the chip was moved into production, the yield was very low but better than feared: two working dies per four-inch wafer. The 8087 ended up being a large success, said to have been Intel's most profitable product line at times. The success of the 8087 (along with the 8088) cemented the reputation of Intel Israel, which eventually became Israel's largest tech employer. The benefits of floating-point hardware proved to be so great that Intel integrated the floating-point unit into later processors starting with the 80486 (1989). Nowadays, most modern computers, from cellphones to mainframes, provide floating point based on the 8087, so I consider the 8087 one of the most influential chips ever created. For more, follow me on Bluesky (@righto.com), Mastodon (@kenshirriff@oldbytes.space), or RSS. I wrote some articles about the 8087 a few years ago, including the die, the ROM, the bit shifter, and the constants, so you may have seen some of this material before. ## Notes and references 1. Most computers now use the IEEE 754 floating-point standard, which is based on the 8087. This standard has been awarded a milestone in computation. ↩ 2. Curiously, reliable sources differ on the number of transistors in the 8087 by almost a factor of 2. Intel says 40,000, as does designer William Kahan (link). But in A Numeric Data Processor, designers Rafi Nave and John Palmer wrote that the chip contains "the equivalent of over 65,000 devices" (whatever "equivalent" means). This number is echoed by a contemporary article in _Electronics_ (1980) that says "over 65,000 H-MOS transistors on a 78,000-mil2 die." Many other sources, such as Upgrading & Repairing PCs, specify 45,000 transistors. Designer Rafi Nave stated that the 8087 has 63,000 or 64,000 transistors if you count the ROM transistors directly, but if you count ROM transistors as equivalent to two transistors, then you get about 75,000 transistors. ↩ 3. The 8087 has a 16-bit Status Word that contains the stack top pointer, exception flags, the four-bit condition code, and other values. Although the Status Word appears to be a 16-bit register, it is not implemented as a register. Instead, parts of the Status Word are stored in various places around the chip: the stack top pointer is in the stack circuitry, the exception flags are part of the interrupt circuitry, the condition code bits are next to the datapath, and so on. When the Status Word is read or written, these various circuits are connected to the 8087's internal data bus, making the Status Word appear to be a monolithic entity. Thus, the stack circuitry includes support for reading and writing it. ↩ 4. Intel filed several patents on the 8087, including Numeric data processor, another Numeric data processor, Programmable bidirectional shifter, Fraction bus for use in a numeric data processor, and System bus arbitration, circuitry and methodology. ↩ 5. I started looking at the stack in detail to reverse engineer the micro-instruction format and determine how the 8087's microcode works. I'm working with the "Opcode Collective" on Discord on this project, but progress is slow due to the complexity of the micro-instructions. ↩ 6. The 8087 detects stack underflow in a similar manner. If you pop more values from the stack than are present, the tag will indicate that the register is empty and shouldn't be accessed. This triggers an invalid operation exception. ↩ 7. The 8087 is described in detail in The 8086 Family User's Manual, Numerics Supplement. An overview of the stack is on page 60 of _The 8087 Primer_ by Palmer and Morse. More details are in Kahan's On the Advantages of the 8087's Stack, an unpublished course note (maybe for CS 279?) with a date of Nov 2, 1990 or perhaps August 23, 1994. Kahan discusses why the 8087's design makes it hard to handle stack overflow in How important is numerical accuracy, Dr. Dobbs, Nov. 1997. Another information source is the Oral History of Rafi Nave ↩↩

www.righto.com

December 10, 2025 at 8:10 AM

Ken Shirriff's blog

@righto.com.web.brid.gy

Unusual circuits in the Intel 386's standard cell logic

I've been studying the standard cell circuitry in the Intel 386 processor recently. The 386, introduced in 1985, was Intel's most complex processor at the time, containing 285,000 transistors. Intel's existing design techniques couldn't handle this complexity and the chip began to fall behind schedule. To meet the schedule, the 386 team started using a technique called standard cell logic. Instead of laying out each transistor manually, the layout process was performed by a computer. The idea behind standard cell logic is to create standardized circuits (standard cells) for each type of logic element, such as an inverter, NAND gate, or latch. You feed your circuit description into software that selects the necessary cells, positions these cells into columns, and then routes the wiring between the cells. This "automatic place and route" process creates the chip layout much faster than manual layout. However, switching to standard cells was a risky decision since if the software couldn't create a dense enough layout, the chip couldn't be manufactured. But in the end, the 386 finished ahead of schedule, an almost unheard-of accomplishment.1 The 386's standard cell circuitry contains a few circuits that I didn't expect. In this blog post, I'll take a quick look at some of these circuits: surprisingly large multiplexers, a transistor that doesn't fit into the standard cell layout, and inverters that turned out not to be inverters. (If you want more background on standard cells in the 386, see my earlier post, "Reverse engineering standard cell logic in the Intel 386 processor".) The photo below shows the 386 die with the automatic-place-and-route regions highlighted; I'm focusing on the red region in the lower right. These blocks of logic have cells arranged in rows, giving them a characteristic striped appearance. The dark stripes are the transistors that make up the logic gates, while the lighter regions between the stripes are the "routing channels" that hold the wiring that connects the cells. In comparison, functional blocks such as the datapath on the left and the microcode ROM in the lower right were designed manually to optimize density and performance, giving them a more solid appearance. The 386 die with the standard-cell regions highlighted. As for other features on the chip, the black circles around the border are bond wire connections that go to the chip's external pins. The chip has two metal layers, a small number by modern standards, but a jump from the single metal layer of earlier processors such as the 286. (Providing two layers of metal made automated routing practical: one layer can hold horizontal wires while the other layer can hold vertical wires.) The metal appears white in larger areas, but purplish where circuitry underneath roughens its surface. The underlying silicon and the polysilicon wiring are obscured by the metal layers. ## The giant multiplexers The standard cell circuitry that I'm examining (red box above) is part of the control logic that selects registers while executing an instruction. You might think that it is easy to select which registers take part in an instruction, but due to the complexity of the x86 architecture, it is more difficult. One problem is that a 32-bit register such as EAX can also be treated as the 16-bit register AX, or two 8-bit registers AH and AL. A second problem is that some instructions include a "direction" bit that switches the source and destination registers. Moreover, sometimes the register is specified by bits in the instruction, but in other cases, the register is specified by the microcode. Due to these factors, selecting the registers for an operation is a complicated process with many cases, using control bits from the instruction, from the microcode, and from other sources. Three registers need to be selected for an operation—two source registers and a destination register—and there are about 17 cases that need to be handled. Registers are specified with 7-bit control signals that select one of the 30 registers and control which part of the register is accessed. With three control signals, each 7 bits wide, and about 17 cases for each, you can see that the register control logic is large and complicated. (I wrote more about the 386's registers here.) I'm still reverse engineering the register control logic, so I won't go into details. Instead, I'll discuss how the register control circuit uses multiplexers, implemented with standard cells. A multiplexer is a circuit that combines multiple input signals into a single output by selecting one of the inputs.2 A multiplexer can be implemented with logic gates, for instance, by ANDing each input with the corresponding control line, and then ORing the results together. However, the 386 uses a different approach—CMOS switches—that avoids a large AND/OR gate. Schematic of a CMOS switch. The schematic above shows how a CMOS switch is constructed from two MOS transistors. When the two transistors are on, the output is connected to the input, but when the two transistors are off, the output is isolated. An NMOS transistor is turned on when its input is high, but a PMOS transistor is turned on when its input is _low_. Thus, the switch uses two control inputs, one inverted. The motivation for using two transistors is that an NMOS transistor is better at pulling the output low, while a PMOS transistor is better at pulling the output high, so combining them yields the best performance.3 Unlike a logic gate, the CMOS switch has no amplification, so a signal is weakened as it passes through the switch. As will be seen below, inverters can be used to amplify the signal. The image below shows how CMOS switches appear under the microscope. This image is very hard to interpret because the two layers of metal on the 386 are packed together densely, but you can see that some wires run horizontally and others run vertically. The bottom layer of metal (called M1) runs vertically in the routing area, as well as providing internal wiring for a cell. The top layer of metal (M2) runs horizontally; unlike M1, the M2 wires can cross a cell. The large circles are vias that connect the M1 and M2 layers, while the small circles are connections between M1 and polysilicon or M1 and silicon. The central third of the image is a column of standard cells with two CMOS switches outlined in green. The cells are bordered by the vertical ground rail and +5V rail that power the cells. The routing areas are on either side of the cells, holding the wiring that connects the cells. Two CMOS switches, highlighted in green. The lower switch is flipped vertically compared to the upper switch. Removing the metal layers reveals the underlying silicon with a layer of polysilicon wiring on top. The doped silicon regions show up as dark outlines. I've drawn the polysilicon in green; it forms a transistor (brighter green) when it crosses doped silicon. The metal ground and power lines are shown in blue and red, respectively, with other metal wiring in purple. The black dots are vias between layers. Note how metal wiring (purple) and polysilicon wiring (green) are combined to route signals within the cell. Although this standard cell is complicated, the important thing is that it only needs to be designed once. The standard cells for different functions are all designed to have the same width, so the cells can be arranged in columns, snapped together like Lego bricks. A diagram showing the silicon for a standard-cell switch. The polysilicon is shown in green. The bottom metal is shown in blue, red, and purple. To summarize, this switch circuit allows the input to be connected to the output or disconnected, controlled by the select signal. This switch is more complicated than the earlier schematic because it includes two inverters to amplify the signal. The data input and the two select lines are connected to the polysilicon (green); the cell is designed so these connections can be made on either side. At the top, the input goes through a standard two-transistor inverter. The lower left has two transistors, combining the NMOS half of an inverter with the NMOS half of the switch. A similar circuit on the right combines the PMOS part of an inverter and switch. However, because PMOS transistors are weaker, this part of the circuit is duplicated. A multiplexer is constructed by combining multiple switches, one for each input. Turning on one switch will select the corresponding input. For instance, a four-to-one multiplexer has four switches, so it can select one of the four inputs. A four-way multiplexer constructed from CMOS switches and individual transistors. The schematic above shows a hypothetical multiplexer with four inputs. One optimization is that if an input is always 0, the PMOS transistor can be omitted. Likewise, if an input is always 1, the NMOS transistor can be omitted. One set of select lines is activated at a time to select the corresponding input. The pink circuit selects 1, green selects input A, yellow selects input B, and blue selects 0. The multiplexers in the 386 are similar, but have more inputs. The diagram below shows how much circuitry is devoted to multiplexers in this block of standard cells. The green, purple, and red cells correspond to the multiplexers driving the three register control outputs. The yellow cells are inverters that generate the inverted control signals for the CMOS switches. This diagram also shows how the automatic layout of cells results in a layout that appears random. A block of standard-cell logic with multiplexers highlighted. The metal and polysilicon layers were removed for this photo, revealing the silicon transistors. ## The misplaced transistor The idea of standard-cell logic is that standardized cells are arranged in columns. The space between the cells is the "routing channel", holding the wiring that links the cells. The 386 circuitry follows this layout, except for one single transistor, sitting between two columns of cells. The "misplaced" transistor, indicated by the arrow. The irregular green regions are oxide that was incompletely removed. I wrote some software tools to help me analyze the standard cells. Unfortunately, my tools assumed that all the cells were in columns, so this one wayward transistor caused me considerable inconvenience. The transistor turns out to be a PMOS transistor, pulling a signal high as part of a multiplexer. But why is this transistor out of place? My hypothesis is that the transistor is a bug fix. Regenerating the cell layout was very costly, taking many hours on an IBM mainframe computer. Presumably, someone found that they could just stick the necessary transistor into an unused spot in the routing channel, manually add the necessary wiring, and avoid the delay of regenerating all the cells. ## The fake inverter The simplest CMOS gate is the inverter, with an NMOS transistor to pull the output low and a PMOS transistor to pull the output high. The standard cell circuitry that I examined contains over a hundred inverters of various sizes. (Performance is improved by using inverters that aren't too small but also aren't larger than necessary for a particular circuit. Thus, the standard cell library includes inverters of multiple sizes.) The image below shows a medium-sized standard-cell inverter under the microscope. For this image, I removed the two metal layers with acid to show the underlying polysilicon (bright green) and silicon (gray). The quality of this image is poor—it is difficult to remove the metal without destroying the polysilicon—but the diagram below should clarify the circuit. The inverter has two transistors: a PMOS transistor connected to +5 volts to pull the output high when the input is 0, and an NMOS transistor connected to ground to pull the output low when the input is 1. (The PMOS transistor needs to be larger because PMOS transistors don't function as well as NMOS transistors due to silicon physics.) An inverter as seen on the die. The corresponding standard cell is shown below. The polysilicon input line plays a key role: where it crosses the doped silicon, a transistor gate is formed. To make the standard cell more flexible, the input to the inverter can be connected on either the left or the right; in this case, the input is connected on the right and there is no connection on the left. The inverter's output can be taken from the polysilicon on the upper left or the right, but in this case, it is taken from the upper metal layer (not shown). The power, ground, and output lines are in the lower metal layer, which I have represented by the thin red, blue, and yellow lines. The black circles are connections between the metal layer and the underlying silicon. This inverter appears dozens of times in the circuitry. However, I came across a few inverters that didn't make sense. The problem was that the inverter's output was connected to the output of a multiplexer. Since an inverter is either on or off, its value would clobber the output of the multiplexer.4 This didn't make any sense. I double- and triple-checked the wiring to make sure I hadn't messed up. After more investigation, I found another problem: the input to a "bad" inverter didn't make sense either. The input consisted of two signals shorted together, which doesn't work. Finally, I realized what was going on. A "bad inverter" has the exact silicon layout of an inverter, but it wasn't an inverter: it was independent NMOS and PMOS transistors with separate inputs. Now it all made sense. With two inputs, the input signals were independent, not shorted together. And since the transistors were controlled separately, the NMOS transistor could pull the output low in some circumstances, the PMOS transistor could pull the output high in other circumstances, or both transistors could be off, allowing the multiplexer's output to be used undisturbed. In other words, the "inverter" was just two more cases for the multiplexer. The "bad" inverter. (Image is flipped vertically for comparison with the previous inverter.) If you compare the "bad inverter" cell below with the previous cell, they look _almost_ the same, but there are subtle differences. First, the gates of the two transistors are connected in the real inverter, but disconnected by a small gap in the transistor pair. I've indicated this gap in the photo above; it is hard to tell if the gap is real or just an imaging artifact, so I didn't spot it. The second difference is that the "fake" inverter has two input connections, one to each transistor, while the inverter has a single input connection. Unfortunately, I assumed that the two connections were just a trick to route the signal across the inverter without requiring an extra wire. In total, this cell was used 32 times as a real inverter and 9 times as independent transistors. ## Conclusions Standard cell logic and automatic place and route have a long history before the 386, back to the early 1970s, so this isn't an Intel invention.5 Nonetheless, the 386 team deserves the credit for deciding to use this technology at a time when it was a risky decision. They needed to develop custom software for their placing and routing needs, so this wasn't a trivial undertaking. This choice paid off and they completed the 386 ahead of schedule. The 386 ended up being a huge success for Intel, moving the x86 architecture to 32 bits and defining the dominant computer architecture for the rest of the 20th century. If you're interested in standard cell logic, I also wrote about standard cell logic in an IBM chip. I plan to write more about the 386, so follow me on Mastodon, Bluesky, or RSS for updates. Thanks to Pat Gelsinger and Roxanne Koester for providing helpful papers. For more on the 386 and other chips, follow me on Mastodon (@kenshirriff@oldbytes.space), Bluesky (@righto.com), or RSS. (I've given up on Twitter.) If you want to read more about the 386, I've written about the clock pin, prefetch queue, die versions, packaging, and I/O circuits. ## Notes and references 1. The decision to use automatic place and route is described on page 13 of the Intel 386 Microprocessor Design and Development Oral History Panel, a very interesting document on the 386 with discussion from some of the people involved in its development. ↩ 2. Multiplexers often take a binary control signal to select the desired input. For instance, an 8-to-1 multiplexer selects one of 8 inputs, so a 3-bit control signal can specify the desired input. The 386's multiplexers use a different approach with one control signal per input. One of the 8 control signals is activated to select the desired input. This approach is called a "one-hot encoding" since one control line is activated (hot) at a time. ↩ 3. Some chips, such as the MOS Technology 6502 processor, are built with NMOS technology, without PMOS transistors. Multiplexers in the 6502 use a single NMOS transistor, rather than the two transistors in the CMOS switch. However, the performance of the switch is worse. ↩ 4. One very common circuit in the 386 is a latch constructed from an inverter loop and a switch/multiplexer. The inverter's output and the switch's output are connected together. The trick, however, is that the inverter is constructed from special weak transistors. When the switch is disabled, the inverter's weak output is sufficient to drive the loop. But to write a value into the latch, the switch is enabled and its output overpowers the weak inverter. The point of this is that there _are_ circuits where an inverter and a multiplexer have their outputs connected. However, the inverter must be constructed with special weak transistors, which is not the situation that I'm discussing. ↩ 5. I'll provide more history on standard cells in this footnote. RCA patented a bipolar standard cell in 1971, but this was a fixed arrangement of transistors and resistors, more of a gate array than a modern standard cell. Bell Labs researched standard cell layout techniques in the early 1970s, calling them Polycells, including a 1973 paper by Brian Kernighan. By 1979, A Guide to LSI Implementation discussed the standard cell approach and it was described as well-known in this patent application. Even so, Electronics called these design methods "futuristic" in 1980. Standard cells became popular in the mid-1980s as faster computers and improved design software made it practical to produce semi-custom designs that used standard cells. Standard cells made it to the cover of Digital Design in August 1985, and the article inside described numerous vendors and products. Companies like Zymos and VLSI Technology (VTI) focused on standard cells. Traditional companies such as Texas Instruments, NCR, GE/RCA, Fairchild, Harris, ITT, and Thomson introduced lines of standard cell products in the mid-1980s. ↩

www.righto.com

November 23, 2025 at 7:50 AM

Ken Shirriff's blog

@righto.com.web.brid.gy

Solving the NYTimes Pips puzzle with a constraint solver

The New York Times recently introduced a new daily puzzle called Pips. You place a set of dominoes on a grid, satisfying various conditions. For instance, in the puzzle below, the pips (dots) in the purple squares must sum to 8, there must be fewer than 5 pips in the red square, and the pips in the three green squares must be equal. (It doesn't take much thought to solve this "easy" puzzle, but the "medium" and "hard" puzzles are more challenging.) The New York Times Pips puzzle from Oct 5, 2025 (easy). Hint: What value must go in the three green squares? I was wondering about how to solve these puzzles with a computer. Recently, I saw an article on Hacker News—"Many hard LeetCode problems are easy constraint problems"—that described the benefits and flexibility of a system called a constraint solver. A constraint solver takes a set of constraints and finds solutions that satisfy the constraints: exactly what Pips requires. I figured that solving Pips with a constraint solver would be a good way to learn more about these solvers, but I had several questions. Did constraint solvers require incomprehensible mathematics? How hard was it to express a problem? Would the solver quickly solve the problem, or would it get caught in an exponential search? It turns out that using a constraint solver was straightforward; it took me under two hours from knowing nothing about constraint solvers to solving the problem. The solver found solutions in milliseconds (for the most part). However, there were a few bumps along the way. In this blog post, I'll discuss my experience with the MiniZinc1 constraint modeling system and show how it can solve Pips. ## Approaching the problem Writing a program for a constraint solver is very different from writing a regular program. Instead of telling the computer _how_ to solve the problem, you tell it _what_ you want: the conditions that must be satisfied. The solver then "magically" finds solutions that satisfy the problem. To solve the problem, I created an array called `pips` that holds the number of domino pips at each position in the grid. Then, the three constraints for the above problem can be expressed as follows. You can see how the constraints directly express the conditions in the puzzle. constraint pips[1,1] + pips[2,1] == 8; constraint pips[2,3] < 5; constraint all_equal([pips[3,1], pips[3,2], pips[3,3]]); Next, I needed to specify where dominoes could be placed for the puzzle. To do this, I defined an array called `grid` that indicated the allowable positions: 1 indicates a valid position and 0 indicates an invalid position. (If you compare with the puzzle at the top of the article, you can see that the grid below matches its shape.) grid = [| 1,1,0| 1,1,1| 1,1,1|]; I also defined the set of dominoes for the problem above, specifying the number of spots in each half: spots = [|5,1| 1,4| 4,2| 1,3|]; So far, the constraints directly match the problem. However, I needed to write some more code to specify how these pieces interact. But before I describe that code, I'll show a solution. I wasn't sure what to expect: would the constraint solver give me a solution or would it spin forever? It turned out to find the unique solution in 109 milliseconds, printing out the solution arrays. The `pips` array shows the number of pips in each position, while the `dominogrid` array shows which domino (1 through 4) is in each position. pips = [| 4, 2, 0 | 4, 5, 3 | 1, 1, 1 |]; dominogrid = [| 3, 3, 0 | 2, 1, 4 | 2, 1, 4 |]; The text-based solution above is a bit ugly. But it is easy to create graphical output. MiniZinc provides a JavaScript API, so you can easily display solutions on a web page. I wrote a few lines of JavaScript to draw the solution, as shown below. (I just display the numbers since I was too lazy to draw the dots.) Solving this puzzle is not too impressive—it's an "easy" puzzle after all—but I'll show below that the solver can also handle considerably more difficult puzzles. Graphical display of the solution. ### Details of the code While the above code specifies a particular puzzle, a bit more code is required to define how dominoes and the grid interact. This code may appear strange because it is implemented as constraints, rather than the procedural operations in a normal program. My main design decision was how to specify the locations of dominoes. I considered assigning a grid position and orientation to each domino, but it seemed inconvenient to deal with multiple orientations. Instead, I decided to position each half of the domino independently, with an `x` and `y` coordinate in the grid.2 I added a constraint that the two halves of each domino had to be in neighboring cells, that is, either the X or Y coordinates had to differ by 1. constraint forall(i in DOMINO) (abs(x[i, 1] - x[i, 2]) + abs(y[i, 1] - y[i, 2]) == 1); It took a bit of thought to fill in the `pips` array with the number of spots on each domino. In a normal programming language, one would loop over the dominoes and store the values into `pips`. However, here it is done with a constraint so the solver makes sure the values are assigned. Specifically, for each half-domino, the `pips` array entry at the domino's x/y coordinate must equal the corresponding `spots` on the domino: constraint forall(i in DOMINO, j in HALF) (pips[y[i,j], x[i, j]] == spots[i, j]); I decided to add another array to keep track of which domino is in which position. This array is useful to see the domino locations in the output, but it also keeps dominoes from overlapping. I used a constraint to put each domino's number (1, 2, 3, etc.) into the occupied position of `dominogrid`: constraint forall(i in DOMINO, j in HALF) (dominogrid[y[i,j], x[i, j]] == i); Next, how do we make sure that dominoes only go into positions allowed by `grid`? I used a constraint that a square in `dominogrid` must be empty or the corresponding `grid` must allow a domino.3 This uses the "or" condition, which is expressed as `\/`, an unusual stylistic choice. (Likewise, "and" is expressed as `/\`. These correspond to the logical symbols ∨ and ∧.) constraint forall(i in 1..H, j in 1..W) (dominogrid[i, j] == 0 \/ grid[i, j] != 0); Honestly, I was worried that I had too many arrays and the solver would end up in a rathole ensuring that the arrays were consistent. But I figured I'd try this brute-force approach and see if it worked. It turns out that it worked for the most part, so I didn't need to do anything more clever. Finally, the program requires a few lines to define some constants and variables. The constants below define the number of dominoes and the size of the grid for a particular problem: int: NDOMINO = 4; % Number of dominoes in the puzzle int: W = 3; % Width of the grid in this puzzle int: H = 3; % Height of the grid in this puzzle Next, datatypes are defined to specify the allowable values. This is very important for the solver; it is a "finite domain" solver, so limiting the size of the domains reduces the size of the problem. For this problem, the values are integers in a particular range, called a `set`: set of int: DOMINO = 1..NDOMINO; % Dominoes are numbered 1 to NDOMINO set of int: HALF = 1..2; % The domino half is 1 or 2 set of int: xcoord = 1..W; % Coordinate into the grid set of int: ycoord = 1..H; At last, I define the sizes and types of the various arrays that I use. One very important syntax is `var`, which indicates variables that the solver must determine. Note that the first two arrays, `grid` and `spots` do not have `var` since they are constant, initialized to specify the problem. array[1..H,1..W] of 0..1: grid; % The grid defining where dominoes can go array[DOMINO, HALF] of int: spots; % The number of spots on each half of each domino array[DOMINO, HALF] of var xcoord: x; % X coordinate of each domino half array[DOMINO, HALF] of var ycoord: y; % Y coordinate of each domino half array[1..H,1..W] of var 0..6: pips; % The number of pips (0 to 6) at each location. array[1..H,1..W] of var 0..NDOMINO: dominogrid; % The domino sequence number at each location You can find all the code on GitHub. One weird thing is that because the code is not procedural, the lines can be in any order. You can use arrays or constants before you use them. You can even move `include` statements to the end of the file if you want! ## Complications Overall, the solver was much easier to use than I expected. However, there were a few complications. By changing a setting, the solver can find multiple solutions instead of stopping after the first. However, when I tried this, the solver generated thousands of meaningless solutions. A closer look showed that the problem was that the solver was putting arbitrary numbers into the "empty" cells, creating valid but pointlessly different solutions. It turns out that I didn't explicitly forbid this, so the sneaky constraint solver went ahead and generated tons of solutions that I didn't want. Adding another constraint fixed the problem. The moral is that even if you think your constraints are clear, solvers are very good at finding unwanted solutions that technically satisfy the constraints. 4 A second problem is that if you do something wrong, the solver simply says that the problem is unsatisfiable. Maybe there's a clever way of debugging, but I ended up removing constraints until the problem can be satisfied, and then see what I did wrong with that constraint. (For instance, I got the array indices backward at one point, making the problem insoluble.) The most concerning issue is the unpredictability of the solver: maybe it will take milliseconds or maybe it will take hours. For instance, the Oct 5 hard Pips puzzle (below) caused the solver to take minutes for no apparent reason. However, the MiniZinc IDE supports different solver backends. I switched from the default Gecode solver to Chuffed, and it immediately found numerous solutions, 384 to be precise. (Sometimes the Pips puzzles sometimes have multiple solutions, which players find controversial.) I suspect that the multiple solutions messed up the Gecode solver somehow, perhaps because it couldn't narrow down a "good" branch in the search tree. For a benchmark of the different solvers, see the footnote.5 Two of the 384 solutions to the NYT Pips puzzle from Oct 5, 2025 (hard difficulty). ## How does a constraint solver work? If you were writing a program to solve Pips from scratch, you'd probably have a loop to try assigning dominoes to positions. The problem is that the problem grows exponentially. If you have 16 dominoes, there are 16 choices for the first domino, 15 choices for the second, and so forth, so about 16! combinations in total, and that's ignoring orientations. You can think of this as a search tree: at the first step, you have 16 branches. For the next step, each branch has 15 sub-branches. Each sub-branch has 14 sub-sub-branches, and so forth. An easy optimization is to check the constraints after each domino is added. For instance, as soon as the "less than 5" constraint is violated, you can backtrack and skip that entire section of the tree. In this way, only a subset of the tree needs to be searched; the number of branches will be large, but hopefully manageable. A constraint solver works similarly, but in a more abstract way. The constraint solver assigns values to the variables, backtracking when a conflict is detected. Since the underlying problem is typically NP-complete, the solver uses heuristics to attempt to improve performance. For instance, variables can be assigned in different orders. The solver attempts to generate conflicts as soon as possible so large pieces of the search tree can be pruned sooner rather than later. (In the domino case, this corresponds to placing dominoes in places with the tightest constraints, rather than scattering them around the puzzle in "easy" spots.) Another technique is constraint propagation. The idea is that you can derive new constraints and catch conflicts earlier. For instance, suppose you have a problem with the constraints "a equals c" and "b equals c". If you assign "a=1" and "b=2", you won't find a conflict until later, when you try to find a value for "c". But with constraint propagation, you can derive a new constraint "a equals b", and the problem will turn up immediately. (Solvers handle more complicated constraint propagation, such as inequalities.) The tradeoff is that generating new constraints takes time and makes the problem larger, so constraint propagation can make the solver slower. Thus, heuristics are used to decide when to apply constraint propagation. Researchers are actively developing new algorithms, heuristics, and optimizations6 such as backtracking more aggressively (called "backjumping"), keeping track of failing variable assignments (called "nogoods"), and leveraging Boolean SAT (satisfiability) solvers. Solvers compete in annual challenges to test these techniques against each other. The nice thing about a constraint solver is that you don't need to know anything about these techniques; they are applied automatically. ## Conclusions I hope this has convinced you that constraint solvers are interesting, not too scary, and can solve real problems with little effort. Even as a beginner, I was able to get started with MiniZinc quickly. (I read half the tutorial and then jumped into programming.) One reason to look at constraint solvers is that they are a completely different programming paradigm. Using a constraint solver is like programming on a higher level, not worrying about how the problem gets solved or what algorithm gets used. Moreover, analyzing a problem in terms of constraints is a different way of thinking about algorithms. Some of the time it's frustrating when you can't use familiar constructs such as loops and assignments, but it expands your horizons. Finally, writing code to solve Pips is more fun than solving the problems by hand, at least in my opinion, so give it a try! For more, follow me on Bluesky (@righto.com), Mastodon (@kenshirriff@oldbytes.space), RSS, or subscribe here. Solution to the Pips puzzle, September 21, 2005 (hard). This puzzle has regions that must all be equal (=) and regions that must all be different (≠). Conveniently, MiniZinc has `all_equal` and `alldifferent` constraint functions. ## Notes and references 1. I started by downloading the MiniZinc IDE and reading the MiniZinc tutorial. The MiniZinc IDE is straightforward, with an editor window at the top and an output window at the bottom. Clicking the "Run" button causes it to generate a solution. Screenshot of the MiniZinc IDE. Click for a larger view. ↩ 2. It might be cleaner to combine the X and Y coordinates into a single `Point` type, using a MiniZinc record type. ↩ 3. I later decided that it made more sense to enforce that `dominogrid` is empty if and only if `grid` is 0 at that point, although it doesn't affect the solution. This constraint uses the "if and only if" operator `<->`. constraint forall(i in 1..H, j in 1..W) (dominogrid[i, j] == 0 <-> grid[i, j] == 0); ↩ 4. To prevent the solver from putting arbitrary numbers in the unused positions of `pips`, I added a constraint to force these values to be zero: constraint forall(i in 1..H, j in 1..W) (grid[i, j] == 0 -> pips[i, j] == 0); Generating multiple solutions had a second issue, which I expected: A symmetric domino can be placed in two redundant ways. For instance, a double-six domino can be flipped to produce a solution that is technically different but looks the same. I fixed this by adding constraints for each symmetric domino to allow only one of the two redundant positions. The constraint below forces a preferred orientation for symmetric dominoes. constraint forall(i in DOMINO) (spots[i,1] != spots[i,2] \/ x[i,1] > x[i,2] \/ (x[i,1] == x[i,2] /\ y[i,1] > y[i,2])); To enable multiple solutions in MiniZinc, the setting is under Show Configuration Editor > User Defined Behavior > Satisfaction Problems or the `--all` flag from the command line. ↩ 5. MiniZinc has five solvers that can solve this sort of integer problem: Chuffed, OR Tools CP-SAT, Gecode, HiGHS, and Coin-OR BC. I measured the performance of the five solvers against 20 different Pips puzzles. Most of the solvers found solutions in under a second, most of the time, but there is a lot of variation. Timings for different solvers on 20 Pip puzzles. Overall, Chuffed had the best performance on the puzzles that I tested, taking well under a second. Google's OR-Tools won all the categories in the 2025 MiniZinc challenge, but it was considerably slower than Chuffed for my Pips programs. The default Gecode solver performed very well most of the time, but it did terribly on a few problems, taking over 15 minutes. HiGHs was slower in general, taking a few minutes on the hardest problems, but it didn't fail as badly as Gecode. (Curiously, Gecode and HiGHS sometimes found different problems to be difficult.) Finally, Coin-OR BC was uniformly bad; at best it took a few seconds, but one puzzle took almost two hours and others weren't solved before I gave up after two hours. (I left Coin-OR BC off the graph because it messed up the scale.) Don't treat these results too seriously because different solvers are optimized for different purposes. (In particular, Coin-OR BC is designed for linear problems.) But the results demonstrate the unpredictability of solvers: maybe you get a solution in a second and maybe you get a solution in hours. ↩ 6. If you want to read more about solvers, Constraint Satisfaction Problems is an overview presentation. The Gecode algorithms are described in a nice technical report: Constraint Programming Algorithms used in Gecode. Chuffed is more complicated: "Chuffed is a state of the art lazy clause solver designed from the ground up with lazy clause generation in mind. Lazy clause generation is a hybrid approach to constraint solving that combines features of finite domain propagation and Boolean satisfiability." The Chuffed paper Lazy clause generation reengineered and slides are more of a challenge. ↩

www.righto.com

November 1, 2025 at 7:29 AM

Ken Shirriff's blog

@righto.com.web.brid.gy

A Navajo weaving of an integrated circuit: the 555 timer

The noted Diné (Navajo) weaver Marilou Schultz recently completed an intricate weaving composed of thick white lines on a black background, punctuated with reddish-orange diamonds. Although this striking rug may appear abstract, it shows the internal circuitry of a tiny silicon chip known as the 555 timer. This chip has hundreds of applications in everything from a sound generator to a windshield wiper controller. At one point, the 555 was the world's best-selling integrated circuit with billions sold. But how did the chip get turned into a rug? "Popular Chip" by Marilou Schultz. Photo courtesy of First American Art Magazine. The 555 chip is constructed from a tiny flake of silicon with a layer of metallic wiring on top. In the rug, this wiring is visible as the thick white lines, while the silicon forms the black background. One conspicuous feature of the rug is the reddish-orange diamonds around the perimeter. These correspond to the connections between the silicon chip and its eight pins. Tiny golden bond wires—thinner than a human hair—are attached to the square bond pads to provide these connections. The circuitry of the 555 chip contains 25 transistors, silicon devices that can switch on and off. The rug is dominated by three large transistors, the filled squares with a 王 pattern inside, while the remaining transistors are represented by small dots. The weaving was inspired by a photo of the 555 timer die taken by Antoine Bercovici (Siliconinsider); I suggested this photo to Schultz as a possible subject for a rug. The diagram below compares the weaving (left) with the die photo (right). As you can see, the weaving closely follows the actual chip, but there are a few artistic differences. For instance, two of the bond pads have been removed, the circuitry at the top has been simplified, and the part number at the bottom has been removed. A comparison of the rug (left) and the original photograph (right). Dark-field image of the 555 timer is courtesy of Antoine Bercovici. Antoine took the die photo with a dark field microscope, a special type of microscope that produces an image on a black background. This image emphasizes the metal layer on the top of the die. In comparison, a standard bright-field microscope produced the image below. When a chip is manufactured, regions of silicon are "doped" with impurities to create transistors and resistors. These regions are visible in the image below as subtle changes in the color of the silicon. The RCA CA555 chip. Photo courtesy of Tiny Transistors. In the weaving, the chip's design appears almost monumental, making it easy to forget that the actual chip is microscopic. For the photo below, I obtained a version of the chip packaged in a metal can, rather than the typical rectangle of black plastic. Cutting the top off the metal can reveals the tiny chip inside, with eight gold bond wires connecting the die to the pins of the package. If you zoom in on the photo, you may recognize the three large transistors that dominate the rug. The 555 timer die inside a metal-can package, with a penny for comparison. Click this image (or any other) for a larger version. The artist, Marilou Schultz, has been creating chip rugs since 1994, when Intel commissioned a rug based on the Pentium as a gift to AISES (American Indian Science & Engineering Society). Although Schultz learned weaving as a child, the Pentium rug was a challenge due to its complex pattern and lack of symmetry; a day's work might add just an inch to the rug. This dramatic weaving was created with wool from the long-horned Navajo-Churro sheep, colored with traditional plant dyes. "Replica of a Chip", created by Marilou Schultz, 1994. Wool. Photo taken at the National Gallery of Art, 2024. For the 555 timer weaving, Schultz experimented with different materials. Silver and gold metallic threads represent the aluminum and copper in the chip. The artist explains that "it took a lot more time to incorporate the metallic threads," but it was worth the effort because "it is spectacular to see the rug with the metallics in the dark with a little light hitting it." Aniline dyes provided the black and lavender colors. Although natural logwood dye produces a beautiful purple, it fades over time, so Schultz used an aniline dye instead. The lavender colors are dedicated to the weaver's mother, who passed away in February; purple was her favorite color. ## Inside the chip How does the 555 chip produce a particular time delay? You add external components—resistors and a capacitor—to select the time. The capacitor is filled (charged) at a speed controlled by the resistor. When the capacitor get "full", the 555 chip switches operation and starts emptying (discharging) the capacitor. It's like filling a sink: if you have a large sink (capacitor) and a trickle of water (large resistor), the sink fills slowly. But if you have a small sink (capacitor) and a lot of water (small resistor), the sink fills quickly. By using different resistors and capacitors, the 555 timer can provide time intervals from microseconds to hours. I've constructed an interactive chip browser that shows how the regions of the rug correspond to specific electronic components in the physical chip. Click on any part of the rug to learn the function of the corresponding component in the chip. Click the die or schematic for details... For instance, two of the large square transistors turn the chip's output on or off, while the third large transistor discharges the capacitor when it is full. (To be precise, the capacitor goes between 1/3 full and 2/3 full to avoid issues near "empty" and "full".) The chip has circuits called comparators that detect when the capacitor's voltage reaches 1/3 or 2/3, switching between emptying and filling at those points. If you want more technical details about the 555 chip, see my previous articles: an early 555 chip, a 555 timer similar to the rug, and a more modern CMOS version of the 555. ## Conclusions The similarities between Navajo weavings and the patterns in integrated circuits have long been recognized. Marilou Schultz's weavings of integrated circuits make these visual metaphors into concrete works of art. This connection is not just metaphorical, however; in the 1960s, the semiconductor company Fairchild employed numerous Navajo workers to assemble chips in Shiprock, New Mexico. I wrote about this complicated history in The Pentium as a Navajo Weaving. This work is being shown at SITE Santa Fe's Once Within a Time exhibition (running until January 2026). I haven't seen the exhibition in person, so let me know if you visit it. For more about Marilou Schultz's art, see The Diné Weaver Who Turns Microchips Into Art, or A Conversation with Marilou Schultz on YouTube. Many thanks to Marilou Schultz for discussing her art with me. Thanks to First American Art Magazine for providing the photo of her 555 rug. Follow me on Mastodon (@kenshirriff@oldbytes.space), Bluesky (@righto.com), or RSS for updates.

www.righto.com

September 15, 2025 at 5:26 AM

Ken Shirriff's blog

@righto.com.web.brid.gy

Why do people keep writing about the imaginary compound Cr2Gr2Te6?

I was reading the latest issue of the journal _Science_ , and a paper mentioned the compound Cr2Gr2Te6. For a moment, I thought my knowledge of the periodic table was slipping, since I couldn't remember the element Gr. It turns out that _Gr_ was supposed to be _Ge_ , germanium, but that raises two issues. First, shouldn't the peer reviewers and proofreaders at a top journal catch this error? But more curiously, it appears that Cr2Gr2Te6 is a mistake that has been copied around several times. The _Science_ paper [1] states, "Intrinsic ferromagnetism in these materials was discovered in Cr2Gr2Te6 and CrI3 down to the bilayer and monolayer thickness limit in 2017." I checked the referenced paper [2] and verified that the correct compound is Cr2**Ge** 2Te6, with Ge for germanium. But in the process, I found more publications that _specifically_ mention the 2017 discovery of intrinsic ferromagnetism in both Cr2Gr2Te6 and CrI3. A 2021 paper in _Nanoscale_ [3] says, "Since the discovery of intrinsic ferromagnetism in atomically thin Cr2Gr2Te6 and CrI3 in 2017, research on two-dimensional (2D) magnetic materials has become a highlighted topic." Then, a 2023 book chaper [4] opens with the abstract: "Since the discovery of intrinsic long-range magnetic order in two-dimensional (2D) layered magnets, e.g., Cr2Gr2Te6 and CrI3 in 2017, [...]" This illustrates how easy it is for a random phrase to get copied around with nobody checking it. (Earlier, I found a bogus computer definition that has persisted for over 50 years.) To be sure, these could all be independent typos—it's an easy typo to make since G**e** and G**r** are neighbors on the keyboard and Cr2Gr2 scans better than Cr2Ge2. A few other papers [5, 6, 7] have the same typo, but in different contexts. My bigger concern is that once AI picks up the erroneous formula Cr2Gr2Te6, it will propagate as misinformation forever. I hope that by calling out this error, I can bring an end to it. In any case, if anyone ends up here after a web search, I can at least confirm that there isn't a new element Gr and the real compound is Cr2**Ge** 2Te6, chromium germanium telluride. A shiny crystal of Cr2Ge2Te6 about 5mm across. Photo courtesy of 2D Semiconductors, a supplier of quantum materials. ## References 1] He, B. et al. (2025) ‘Strain-coupled, crystalline polymer-inorganic interfaces for efficient magnetoelectric sensing’, _Science_ , 389(6760), pp 623-631. ([link) 2] Gong, C. et al. (2017) ‘Discovery of intrinsic ferromagnetism in two-dimensional van der Waals crystals’, _Nature_ , 546(7657), pp. 265–269. ([link) 3] Zhang, S. et al. (2021) ‘Two-dimensional magnetic materials: structures, properties and external controls’, _Nanoscale_ , 13(3), pp. 1398–1424. ([link) 4] Yin, T. (2024) ‘Novel Light-Matter Interactions in 2D Magnets’, in D. Ranjan Sahu (ed.) _Modern Permanent Magnets - Fundamentals and Applications._ ([link) 5] Zhao, B. et al. (2023) ‘Strong perpendicular anisotropic ferromagnet Fe3GeTe2/graphene van der Waals heterostructure’, _Journal of Physics D: Applied Physics_ , 56(9) 094001. ([link) 6] Ren, H. and Lan, M. (2023) ‘Progress and Prospects in Metallic FexGeTe2 (3≤x≤7) Ferromagnets’, _Molecules_ , 28(21), p. 7244. ([link) 7] Hu, S. et al. (2019) 'Anomalous Hall effect in Cr2Gr2Te6/Pt hybride structure', Taiwan-Japan Joint Workshop on Condensed Matter Physics for Young Researchers, Saga, Japan. ([link)

www.righto.com

August 26, 2025 at 5:07 AM

Ken Shirriff's blog

@righto.com.web.brid.gy

Here be dragons: Preventing static damage, latchup, and metastability in the 386

I've been reverse-engineering the Intel 386 processor (from 1985), and I've come across some interesting circuits for the chip's input/output (I/O) pins. Since these pins communicate with the outside world, they face special dangers: static electricity and latchup can destroy the chip, while metastability can cause serious malfunctions. These I/O circuits are completely different from the logic circuits in the 386, and I've come across a previously-undescribed flip-flop circuit, so I'm venturing into uncharted territory. In this article, I take a close look at how the I/O circuitry protects the 386 from the "dragons" that can destroy it. The 386 die, zooming in on some of the bond pad circuits. The colors change due to the effects of different microscope lenses. Click this image (or any other) for a larger version. The photo above shows the die of the 386 under a microscope. The dark, complex patterns arranged in rectangular regions arise from the two layers of metal that connect the circuits on the 386 chip. Not visible are the transistors, formed from silicon and polysilicon and hidden beneath the metal. Around the perimeter of this fingernail-sized silicon die, 141 square bond pads provide the connections between the chip and the outside world; tiny gold bond wires connect the bond pads to the package. Next to each I/O pad, specialized circuitry provides the electrical interface between the chip and the external components while protecting the chip. I've zoomed in on three groups of these bond pads along with the associated I/O circuits. The circuits at the top (for data pins) and the left (for address pins) are completely different from the control pin circuits at the bottom, showing how the circuitry varies with the pin's function. ## Static electricity The first dragon that threatens the 386 is static electricity, able to burn a hole in the chip. MOS transistors are constructed with a thin insulating oxide layer underneath the transistor's gate. In the 386, this fragile, glass-like oxide layer is just 250 nm thick, the thickness of a virus. Static electricity, even a small amount, can blow a hole through this oxide layer and destroy the chip. If you've ever walked across a carpet and felt a spark when you touch a doorknob, you've generated at least 3000 volts of chip-destroying static electricity. Intel recommends an anti-static mat and a grounding wrist strap when installing a processor to avoid the danger of static electricity, also known as Electrostatic Discharge or ESD.1 To reduce the risk of ESD damage, chips have protection diodes and other components in their I/O circuitry. The schematic below shows the circuit for a typical 386 input. The goal is to prevent static discharge from reaching the inverter, where it could destroy the inverter's transistors. The diodes next to the pad provide the first layer of protection; they redirect excess voltage to the +5 rail or ground. Next, the resistor reduces the current that can reach the inverter. The third diode provides a final layer of protection. (One unusual feature of this input—unrelated to ESD—is that the input has a pull-up, which is implemented with a transistor that acts like a 20kΩ resistor.2) Schematic for the `BS16#` pad circuit. The `BS16#` signal indicates to the 386 if the external bus is 16 bits or 32 bits. The image below shows how this circuit appears on the die. For this photo, I dissolved the metal layers with acids, stripping the die down to the silicon to make the transistors visible. The diodes and pull-up resistor are implemented with transistors.3 Large grids of transistors form the pad-side diodes, while the third diode is above. The current-limiting protection resistor is implemented with polysilicon, which provides higher resistance than metal wiring. The capacitor is implemented with a plate of polysilicon over silicon, separated by a thin oxide layer. As you can see, the protection circuitry occupies much more area than the inverters that process the signal. The circuit for BS16# on the die. The green areas are where the oxide layer was incompletely removed. ## Latchup The transistors in the 386 are created by doping silicon with impurities to change its properties, creating regions of "N-type" and "P-type" silicon. The 386 chip, like most processors, is built from CMOS technology, so it uses two types of transistors: NMOS and PMOS. The 386 starts from a wafer of N-type silicon and PMOS transistors are formed by doping tiny regions to form P-type silicon embedded in the underlying N-type silicon. NMOS transistors are the opposite, with N-type silicon embedded in P-type silicon. To hold the NMOS transistors, "wells" of P-type silicon are formed, as shown in the cross-section diagram below. Thus, the 386 chip contains complex patterns of P-type and N-type silicon that form its 285,000 transistors. The structure of NMOS and PMOS transistors in the 386 forms parasitic NPN and PNP transistors. This diagram is the opposite of other latchup diagrams because the 386 uses N substrate, the opposite of modern chips with P substrate. But something dangerous lurks below the surface, the fire-breathing dragon of latchup waiting to burn up the chip. The problem is that these regions of N-type and P-type silicon form unwanted, "parasitic" transistors underneath the desired transistors. In normal circumstances, these parasitic NPN and PNP transistors are inactive and can be ignored. But if a current flows beneath the surface, through the silicon substrate, it can turn on a parasitic transistor and awaken the dreaded latchup.4 The parasitic transistors form a feedback loop, so if one transistor starts to turn on, it turns on the other transistor, and so forth, until both transistors are fully on, a state called latchup.5 Moreover, the feedback loop will maintain latchup until the chip's power is removed.6 During latchup, the chip's power and ground are shorted through the parasitic transistors, causing high current flow that can destroy the chip by overheating it or even melting bond wires. Latchup can be triggered in many ways, from power supply overvoltage to radiation, but a chip's I/O pins are the primary risk because signals from the outside world are unpredictable. For instance, suppose a floppy drive is connected to the 386 and the drive sends a signal with a voltage higher than the 386's 5-volt supply. (This could happen due to a voltage surge in the drive, reflection in a signal line, or even connecting a cable.) Current will flow through the 386's protection diodes, the diodes that were described in the previous section.7 If this current flows through the chip's silicon substrate, it can trigger latchup and destroy the processor. Because of this danger, the 386's I/O pads are designed to prevent latchup. One solution is to block the unwanted currents through the substrate, essentially putting fences around the transistors to keep malicious currents from escaping into the substrate. In the 386, this fence consists of "guard rings" around the I/O transistors and diodes. These rings prevent latchup by blocking unwanted current flow and safely redirecting it to power or ground. The circuitry for the W/R# output pad. (The W/R# signal tells the computer's memory and I/O if the 386 is performing a write operation or a read operation.) I removed the metal and polysilicon to show the underlying silicon. The diagram above shows the double guard rings for a typical I/O pad.8 Separate guard rings protect the NMOS transistors and the PMOS transistors. The NMOS transistors have an inner guard ring of P-type silicon connected to ground (blue) and an outer guard ring of N-type silicon connected to +5 (red). The rings are reversed for the PMOS transistors. The guard rings take up significant space on the die, but this space isn't wasted since the rings protect the chip from latchup. ## Metastability The final dragon is metastability: it (probably) won't destroy the chip, but it can cause serious malfunctions.9 Metastability is a peculiar problem where a digital signal can take an unbounded amount of time to settle into a zero or a one. In other words, the circuit temporarily refuses to act digitally and shows its underlying analog nature.10 Metastability was controversial in the 1960s and the 1970s, with many electrical engineers not believing it existed or considering it irrelevant. Nowadays, metastability is well understood, with special circuits to prevent it, but metastability can never be completely eliminated. In a processor, everything is synchronized to its clock. While a modern processor has a clock speed of several gigahertz, the 386's clock ran at 12 to 33 megahertz. Inside the processor, signals are carefully organized to change according to the clock—that's why your computer runs faster with a higher clock speed. The problem is that external signals may be independent of the CPU's clock. For instance, a disk drive could send an interrupt to the computer when data is ready, which depends on the timing of the spinning disk. If this interrupt arrives at just the wrong time, it can trigger metastability. A metastable signal settling to a high or low signal after an indefinite time. This image was used to promote a class on metastability in 1974. From My Work on All Things Metastable by Thomas Chaney. In more detail, processors use flip-flops to hold signals under the control of the clock. An "edge-triggered" flip-flop grabs its input at the moment the clock goes high (the "rising edge") and holds this value until the next clock cycle. Everything is fine if the value is stable when the clock changes: if the input signal switches from low to high before the clock edge, the flip-flop will hold this high value. And if the input signal switches from low to high _after_ the clock edge, the flip-flop will hold the low value, since the input was low at the clock edge. But what happens if the input changes from low to high at the exact time that the clock switches? Usually, the flip-flop will pick either low or high. But very rarely, maybe a few times out of a billion, the flip-flop will hesitate in between, neither low nor high. The flip-flop may take a few nanoseconds before it "decides" on a low or high value, and the value will be intermediate until then. The photo above illustrates a metastable signal, spending an unpredictable time between zero and one before settling on a value. The situation is similar to a ball balanced on top of a hill, a point of unstable equilibrium.11 The smallest perturbation will knock the ball down one of the two stable positions at the bottom of the hill, but you don't know which way it will go or how long it will take. A metaphorical view of metastability as a ball on a hill, able to roll down either side. Metastability is serious because if a digital signal has a value that is neither 0 nor 1 then downstream circuitry may get confused. For instance, if part of the processor thinks that it received an interrupt and other parts of the processor think that no interrupt happened, chaos will reign as the processor takes contradictory actions. Moreover, waiting a few nanoseconds isn't a cure because the duration of metastability can be arbitrarily long. Waiting helps, since the chance of metastability decreases exponentially with time, but there is no guarantee.12 The obvious solution is to never change an input exactly when the clock changes. The processor is designed so that internal signals are stable when the clock changes, avoiding metastability. Specifically, the designer of a flip-flop specifies the _setup_ time—how long the signal must be stable before the clock edge—and the _hold_ time—how long the signal must be stable after the clock edge. As long as the input satisfies these conditions, typically a few picoseconds long, the flip-flop will function without metastability. Unfortunately, the setup and hold times can't be guaranteed when the processor receives an external signal that isn't synchronized to its clock, known as an asynchronous signal. For instance, a processor receives interrupt signals when an I/O device has data, but the timing is unpredictable because it depends on mechanical factors such as a keypress or a spinning floppy disk. Most of the time, everything will work fine, but what about the one-in-a-billion case where the timing of the signal is unlucky? (Since modern processors run at multi-gigahertz, one-in-a-billion events are not rare; they can happen multiple times per second.) One solution is a circuit called a synchronizer that takes an asynchronous signal and synchronizes it to the clock. A synchronizer can be implemented with two flip-flops in series: even if the first flip-flop has a metastable output, chances are that it will resolve to 0 or 1 before the second flip-flop stores the value. Each flip-flop provides an exponential reduction in the chance of metastability, so using two flip-flops drastically reduces the risk. In other words, the circuit will still fail occasionally, but if the mean time between failures (MTBF) is long enough (say, decades instead of seconds), then the risk is acceptable. The schematic for the BUSY# pin, showing the flip-flops that synchronize the input signal. The schematic above shows how the 386 uses two flip-flops to minimize metastability. The first flip-flop is a special flip-flop that is based on a sense amplifier. It is much more complicated than a regular flip-flop, but it responds faster, reducing the chance of metastability. It is built from two of the sense-amplifier latches below, which I haven't seen described anywhere. In a DRAM memory chip, a sense amplifier takes a weak signal from a memory cell and rapidly amplifies it into a solid 0 or 1. In this flip-flop, the sense amplifier takes a potentially ambiguous signal and rapidly amplifies it into a 0 or 1. By amplifying the signal quickly, the flip-flop reduces metastability. (See the footnote for details.14) The sense amplifier latch circuit. The die photo below shows how this circuitry looks on the die. Each flip-flop is built from two latches; note that the sense-amp latches are larger than the standard latches. As before, the pad has protection diodes inside guard rings. For some reason, however, these diodes have a different structure from the transistor-based diodes described earlier. The 386 has five inputs that use this circuitry to protect against metastability.13 These inputs are all located together at the bottom of the die—it probably makes the layout more compact when neighboring pad circuits are all the same size. The circuitry for the BUSY# pin, showing the special sense-amplifier latches that reduce metastability. In summary, the 386's I/O circuits are interesting because they are completely different from the chip's regular logic circuitry. In these circuits, the border between digital and analog breaks down; these circuits handle binary signals, but analog issues dominate the design. Moreover, hidden parasitic transistors play key roles; what you don't see can be more important than what you see. These circuits defend against three dangerous "dragons": static electricity, latchup, and metastability. Intel succeeded in warding off these dragons and the 386 was a success. For more on the 386 and other chips, follow me on Mastodon (@kenshirriff@oldbytes.space), Bluesky (@righto.com), or RSS. (I've given up on Twitter.) If you want to read more about 386 input circuits, I wrote about the clock pin here ## Notes and references 1. Anti-static precautions are specified in Intel's processor installation instructions. Also see Intel's Electrostatic Discharge and Electrical Overstress Guide. I couldn't find ESD ratings for the 386, but a modern Intel chip is tested to withstand 500 volts or 2000 volts, depending on the test procedure. ↩ 2. The BS16# pin is slightly unusual because it has an internal pull-up resistor. If you look at the datasheet (9.2.3 and Table 9-3 footnotes), a few input pins (ERROR#, BUSY#, and BS16#) have internal pull-up resistors of 20 kΩ, while the PEREQ input pin has an internal pull-down resistor of 20 kΩ. ↩ 3. The protection diode is probably a grounded-gate NMOS (ggNMOS), an NMOS transistor with the gate, source, and body (but not the drain) tied to ground. This forms a parasitic NPN transistor under the MOSFET that dissipates the ESD. (I think that the PMOS protection is the same, except the gate is pulled high, not grounded.) For output pins, the output driver MOSFETs have parasitic transistors that make the output driver "self-protected". One consequence is that the input pads and the output pads look similar (both have large MOS transistors), unlike other chips where the presence of large transistors indicates an output. (Even so, 386 outputs and inputs can be distinguished because outputs have large inverters inside the guard rings to drive the MOSFETs, while inputs do not.) Also see Practical ESD Protection Design. ↩ 4. The 386 uses P-wells in an N-doped substrate. The substrate is heavily doped with antimony, with a lightly doped N epitaxial layer on top. This doping helped provide immunity to latchup. (See "High performance technology, circuits and packaging for the 80386", ICCD 1986.) For the most part, modern chips use the opposite: N-wells with a P-doped substrate. Why the substrate change? In the earlier days of CMOS, P-well was standard due to the available doping technology, see N-well and P-well performance comparison. During the 1980s, there was controversy over which was better: P-well or N-well: "It is commonly agreed that P-well technology has a proven reliability record, reduced alpha-particle sensitivity, closer matched p- and n- channel devices, and high gain NPN structures. N-well proponents acknowledge better compatibility and performance with NMOS processing and designs, good substrate quality, availability, and cost, lower junction capacitance, and reduced body effects." (See Design of a CMOS Standard Cell Library.) As wafer sizes increased in the 1990s, technology shifted to P-doped substrates because it is difficult to make large N-doped wafers due to the characteristics of the dopants (link). Some chips optimize transistor characteristics by using both types of wells, called a twin-well process. For instance, the Pentium used P-doped wafers and implanted both N and P wells. (See Intel's 0.25 micron, 2.0 volts logic process technology.) ↩ 5. You can also view the parasitic transistors as forming an SCR (Silicon Controlled Rectifier), a four-layer semiconductor device. SCRs were popular in the 1970s because they could handle higher currents and voltages than transistors. But as high-power transistors were developed, SCRs fell out of favor. In particular, once an SCR is turned on, it stays on until power is removed or reversed; this makes SCRs harder to use than transistors. (This is the same characteristic that makes latchup so dangerous.) ↩ 6. Satellites and nuclear missiles have a high risk of latchup due to radiation. Since radiation-induced latchup cannot always be prevented, one technique for dealing with latchup is to detect the excessive current from latchup and then power-cycle the chip. For instance, you can buy a radiation-hardened current limiter chip that will detect excessive current due to latchup and temporarily remove power; this chip sells for the remarkable price of $1780. For more on latchup, see the Texas Instruments Latch-Up white paper, as well as Latch-Up, ESD, and Other Phenomena. ↩ 7. The 80386 Hardware Reference Manual discusses how a computer designer can prevent latchup in the 386. The designer is assured that Intel's "CHMOS III" process prevents latchup under normal operating conditions. However, exceeding the voltage limits on I/O pins can cause current surges and latchup. Intel provides three guidelines: observe the maximum ratings for input voltages, never apply power to a 386 pin before the chip is powered up, and terminate I/O signals properly to avoid overshoot and undershoot. ↩ 8. The circuit for the WR# pin is similar to many other output pins. The basic idea is that a large PMOS transistor pulls the output high, while a large NMOS transistor pulls the output low. If the `enable` input is low, both transistors are turned off and the output floats. (This allows other devices to take over the bus in the HOLD state.) Schematic for the WR# pin driver. The inverters that control the drive transistors have an unusual layout. These inverters are inside the guard rings, meaning that the inverters are split apart, with the NMOS transistors in one ring and PMOS transistors in the other. The extra wiring adds capacitance to the output which probably makes the inverters slightly slower. These inverters have a special design: one inverter is faster to go high than to go low, while the other inverter is the opposite. The motivation is that if both drive transistors are on at the same time, a large current will flow through the transistors from power to ground, producing an unwanted current spike (and potentially latchup). To avoid this, the inverters are designed to turn one drive transistor off faster than turning the other one on. Specifically, the high-side inverter has an extra transistor to quickly pull its output high, while the low-side inverter has an extra transistor to pull the output low. Moreover, the inverter's extra transistor is connected directly to the drive transistors, while the inverter's main output connects through a longer polysilicon path with more resistance, providing an RC delay. I found this layout very puzzling until I realized that the designers were carefully controlling the turn-on and turn-off speeds of these inverters. ↩ 9. In Metastability and Synchronizers: A Tutorial, there's a story of a spacecraft power supply being destroyed by metastability. Supposedly, metastability caused the logic to turn on too many units, overloading and destroying the power supply. I suspect that this is a fictional cautionary tale, rather than an actual incident. For more on metastability, see this presentation and this writeup by Tom Chaney, one of the early investigators of metastability. ↩ 10. One of Vonada's Engineering Maxims is "Digital circuits are made from analog parts." Another maxim is "Synchronizing circuits may take forever to make a decision." These maxims and a dozen others are from Don Vonada in DEC's 1978 book Computer Engineering. ↩ 11. Curiously, the definition of metastability in electronics doesn't match the definition in physics and chemistry. In electronics, a metastable state is an unstable equilibrium. In physics and chemistry, however, a metastable state is a stable state, just not the most stable ground state, so a moderate perturbation will knock it from the metastable state to the ground state. (In the hill analogy, it's as if the ball is caught in a small basin partway down the hill.) ↩ 12. In case you're wondering what's going on with metastability at the circuit level, I'll give a brief explanation. A typical flip-flop is based on a latch circuit like the one below, which consists of two inverters and an electronic switch controlled by the clock. When the clock goes high, the inverters are configured into a loop, latching the prior input value. If the input was high, the output from the first inverter is low and the output from the second inverter is high. The loop feeds this output back into the first inverter, so the circuit is stable. Likewise, the circuit can be stable with a low input. A latch circuit. But what happens if the clock flips the switch as the input is changing, so the input to the first inverter is somewhere between zero and one? We need to consider that an inverter is really an analog device, not a binary device. You can describe it by a "voltage transfer curve" (purple line) that specifies the output voltage for a particular input voltage. For example, if you put in a low input, you get a high output, and vice versa. But there is an equilibrium point where the output voltage is the same as the input voltage. This is where metastability happens. The voltage transfer curve for a hypothetical inverter. Suppose the input voltage to the inverter is the equilibrium voltage. It's not going to be _precisely_ the equilibrium voltage (because of noise if nothing else), so suppose, for example, that it is 1µV above equilibrium. Note that the transfer curve is very steep around equilibrium, say a slope of 100, so it will greatly amplify the signal away from equilibrium. Thus, if the input is 1µV above equilibrium, the output will be 100µV below equilibrium. Then the next inverter will amplify again, sending a signal 10mV above equilibrium back to the first inverter. The distance will be amplified again, now 1000mV below equilibrium. At this point, you're on the flat part of the curve, so the second inverter will output +5V and the first inverter will output 0V, and the circuit is now stable. The point of this is that the equilibrium voltage is an _unstable_ equilibrium, so the circuit will eventually settle into the +5V or 0V states. But it may take an arbitrary number of loops through the inverters, depending on how close the starting point was to equilibrium. (The signal is continuous, so referring to "loops" is a simplification.) Also note that the distance from equilibrium is amplified exponentially with time. This is why the chance of metastability decreases exponentially with time. ↩ 13. Looking at the die shows that the pins with metastability protection are `INTR`, `NMI`, `PEREQ`, `ERROR#`, and `BUSY#`. The 80386 Hardware Reference Manual lists these same five pins as asynchronous—I like it when I spot something unusual on the die and then discover that it matches an obscure statement in the documentation. The interrupt pins `INTR` and `NMI` are asynchronous because they come from external sources that may not be using the 386's clock. But what about `PEREQ`, `ERROR#`, and `BUSY#`? These pins are part of the interface with an external math coprocessor (the 287 or 387 chip). In most cases, the coprocessor uses the 386's clock. However, the 387 supported a little-used asynchronous mode where the processor and the coprocessor could run at different speeds. ↩ 14. The 386's metastability flip-flop is constructed with an unusual circuit. It has two latch stages (which is normal), but instead of using two inverters in a loop, it uses a sense-amplifier circuit. The idea of the sense amplifier is that it takes a differential input. When the clock enables the sense amplifier, it drives the higher input high and the lower input low (the inputs are also the outputs). (Sense amplifiers are used in dynamic RAM chips to amplify the tiny signals from a RAM cell to form a 0 or 1. At the same time, the amplifier refreshes the DRAM cell by generating full voltages.) Note that the sense amplifier's inputs also act as outputs; inputs during clock phase 1 and outputs during phase 2. The schematic shows one of the latch stages; the complete flip-flop has a second stage, identical except that the clock phases are switched. This latch is much more complex than the typical 386 latch; 14 transistors versus 6 or 8. The sense amplifier is similar to two inverters in a loop, except they share a limited power current and a limited ground current. As one inverter starts to go high, it "steals" the supply current from the other. Meanwhile, the other inverter "steals" the ground current. Thus, a small difference in inputs is amplified, just as in a differential amplifier. Thus, by combining the amplification of a differential amplifier with the amplification of the inverter loop, this circuit reaches its final state faster than a regular inverter loop. In more detail, during the first clock phase, the two inverters at the top generate the inverted and non-inverted signals. (In a metastable situation, these will be close to the midpoint, not binary.) During the second clock phase, the sense amplifier is activated. You can think of it as a differential amplifier with cross-coupling. If one input is slightly higher than the other, the amplifier pulls that input higher and the input lower, amplifying the difference. (The point is to quickly make the difference large enough to resolve the metastability.) I couldn't find any latches like this in the literature. Comparative Analysis and Study of Metastability on High-Performance Flip-Flops describes eleven high-performance flip-flops. It includes two flip-flops that are based on sense amplifiers, but their circuits are very different from the 386 circuit. Perhaps the 386 circuit is an Intel design that was never publicized. In any case, let me know if this circuit has an official name. ↩

www.righto.com

August 25, 2025 at 5:07 AM

Ken Shirriff's blog

@righto.com.web.brid.gy

A CT scanner reveals surprises inside the 386 processor's ceramic package

Intel released the 386 processor in 1985, the first 32-bit chip in the x86 line. This chip was packaged in a ceramic square with 132 gold-plated pins protruding from the underside, fitting into a socket on the motherboard. While this package may seem boring, a lot more is going on inside it than you might expect. Lumafield performed a 3-D CT scan of the chip for me, revealing six layers of complex wiring hidden inside the ceramic package. Moreover, the chip has nearly invisible metal wires connected to the _sides_ of the package, the spikes below. The scan also revealed that the 386 has two separate power and ground networks: one for I/O and one for the CPU's logic. A CT scan of the 386 package. The ceramic package doesn't show up in this image, but it encloses the spiky wires. The package, below, provides no hint of the complex wiring embedded inside the ceramic. The silicon die is normally not visible, but I removed the square metal lid that covers it.1 As a result, you can also see the two tiers of gold contacts that surround the silicon die. The 386 package with the lid over the die removed. Intel selected the 132-pin ceramic package to meet the requirements of a high pin count, good thermal characteristics, and low-noise power to the die.2: However, standard packages didn't provide sufficient power, so Intel designed a custom package with "single-row double shelf bonding to two signal layers and four power and ground planes." In other words, the die's bond wires are connected to the two shelves (or tiers) of pads surrounding the die. Internally, the package is like a 6-layer printed-circuit board made from ceramic. Package cross-section. Redrawn from "High Performance Technology, Circuits and Packaging for the 80386". The photo below shows the two tiers of pads with tiny gold bond wires attached: I measured the bond wires at 35 µm in diameter, thinner than a typical human hair. Some pads have up to five wires attached to support more current for the power and ground pads. You can consider the package to be a hierarchical interface from the tiny circuits on the die to the much larger features of the computer's motherboard. Specifically, the die has a feature size of 1 µm, while the metal wiring on top of the die has 6 µm spacing. The chip's wiring connects to the chip's bond pads, which have 0.01" spacing (.25 mm). The bond wires connect to the package's pads, which have 0.02" spacing (.5 mm); double the spacing because there are two tiers. The package connects these pads to the pin grid with 0.1" spacing (2.54 mm). Thus, the scale expands by about a factor of 2500 from the die's microscopic circuitry to the chip's pins. ` Close-up of the bond wires. The ceramic package is manufactured through a complicated process.4 The process starts with flexible ceramic "green sheets", consisting of ceramic powder mixed with a binding agent. After holes for vias are created in the sheet, tungsten paste is silk-screened onto the sheet to form the wiring. The sheets are stacked, laminated under pressure, and then sintered at high temperature (1500ºC to 1600ºC) to create the rigid ceramic. The pins are brazed onto the bottom of the chip. Next, the pins and the inner contacts for the die are electroplated with gold.3 The die is mounted, gold bond wires are attached, and a metal cap is soldered over the die to encapsulate it. Finally, the packaged chip is tested, the package is labeled, and the chip is ready to be sold. The diagram below shows a close-up of a signal layer inside the package. The pins are connected to the package's shelf pads through metal traces, spectacularly colored in the CT scan. (These traces are surprisingly wide and free-form; I expected narrower traces to reduce capacitance.) Bond wires connect the shelf pads to the bond pads on the silicon die. (The die image is added to the diagram; it is not part of the CT scan.) The large red circles are vias from the pins. Some vias connect to this signal layer, while other vias pass through to other layers. The smaller red circles are connections to a power layer; because the shelf pads are only on the two signal layers, the six power planes have connections to the signal layers for bonding. Since bond wires are only connected on the signal layers, the power layers need connections to pads on the signal layers. A close-up of a signal layer. The die image is pasted in. The diagram below shows the corresponding portion of a power layer. A power layer looks completely different from a signal layer; it is a single conductive plane with holes. The grid of smaller holes allows the ceramic above and below this layer to bond, forming a solid piece of ceramic. The larger holes surround pin vias (red dots), allowing pin connections to pass through to a different layer. The red dots that contact the sheet are where power pins connect to this layer. Because the only connections to the die are from the signal layers, the power layers have connections to the signal layers; these are the smaller dots near the bond wires, either power vias passing through or vias connected to this layer. A close-up of a power layer, specifically I/O Vss. The wavy blue regions are artifacts from neighboring layers. The die image is pasted in. With the JavaScript tool below, you can look at the package, layer by layer. Click on a radio button to select a layer. By observing the path of a pin through the layers, you can see where it ends up. For instance, the upper left pin passes through multiple layers until the upper signals layer connects it to the die. The pin to its right passes through all the layers until it reaches the logic Vcc plane on top. (Vcc is the 5-volt supply that powers the chip, called Vcc for historical reasons.) Pins I/O Vcc Signals I/O gnd Signals Logic gnd Logic Vcc If you select the logic Vcc plane above, you'll see a bright blotchy square in the center. This is not the die itself, I think, but the adhesive that attaches the die to the package, epoxy filled with silver to provide thermal and electrical conductivity. Since silver blocks X-rays, it is highly visible in the image. ## Side contacts for electroplating What surprised me most about the scans was seeing wires that stick out to the sides of the package. These wires are used during manufacturing when the pins are electroplated with gold.5 In order to electroplate the pins, each pin must be connected to a negative voltage so it can function as a cathode. This is accomplished by giving each pin a separate wire that goes to the edge of the package. This diagram below compares the CT scan (above) to a visual side view of the package (below). The wires are almost invisible, but can be seen as darker spots. The arrows show how three of these spots match with the CT scan; you can match up the other spots.6 A close-up of the side of the package compared to the CT scan, showing the edge contacts. I lightly sanded the edge of the package to make the contacts more visible. Even so, they are almost invisible. ## Two power networks According to the datasheet, the 386 has 20 pins connected to +5V power (Vcc) and 21 pins connected to ground (Vss). Studying the die, I noticed that the I/O circuitry in the 386 has separate power and ground connections from the logic circuitry. The motivation is that the output pins require high-current driver circuits. When a pin switches from 0 to 1 or vice versa, this can cause a spike on the power and ground wiring. If this spike is too large, it can interfere with the processor's logic, causing malfunctions. The solution is to use separate power wiring inside the chip for the I/O circuitry and for the logic circuitry, connected to separate pins. On the motherboard, these pins are all connected to the same power and ground, but decoupling capacitors absorb the I/O spikes before they can flow into the chip's logic. The diagram below shows how the two power and ground networks look on the die, with separate pads and wiring. The square bond pads are at the top, with dark bond wires attached. The white lines are the two layers of metal wiring, and the darker regions are circuitry. Each I/O pin has a driver circuit below it, consisting of relatively large transistors to pull the pin high or low. This circuitry is powered by the horizontal lines for I/O Vcc (light red) and I/O ground (Vss, light blue). Underneath each I/O driver is a small logic circuit, powered by thinner Vcc (dark red) and Vss (dark blue). Thicker Vss and Vcc wiring goes to the logic in the rest of the chip. Thus, if the I/O circuitry causes power fluctuations, the logic circuit remains undisturbed, protected by its separate power wiring. A close-up of the top of the die, showing the power wiring and the circuitry for seven data pins. The datasheet doesn't mention the separate I/O and logic power networks, but by using the CT scans, I determined which pins power I/O, and which pins power logic. In the diagram below, the light red and blue pins are power and ground for I/O, while the dark red and blue pins are power and ground for logic. The pins are scattered across the package, allowing power to be supplied to all four sides of the die. The pinout from the Intel386DX Microprocessor Datasheet. This is the view from the pin side. ## "No Connect" pins As the diagram above shows, the 386 has eight pins labeled "NC" (No Connect)—when the chip is installed in a computer, the motherboard must leave these pins unconnected. You might think that the 132-pin package simply has eight extra, unneeded pins, but it's more complicated than that. The photo below shows five bond pads at the bottom of the 386 die. Three of these pads have bond wires attached, but two have no bond wires: these correspond to No Connect pins. Note the black marks in the middle of the pads: the marks are from test probes that were applied to the die during testing.7 The No Connect pads presumably have a function during this testing process, providing access to an important internal signal. A close-up of the die showing three bond pads with bond wires and two bond pads without bond wires. Seven of the eight No Connect pads are _almost_ connected: the package has a spot for a bond wire in the die cavity and the package has internal wiring to a No Connect pin. The only thing missing is the bond wire between the pad and the die cavity. Thus, by adding bond wires, Intel could easily create special chips with these pins connected, perhaps for debugging the test process itself. The surprising thing is that one of the No Connect pads _does_ have the bond wire in place, completing the connection to the external pin. (I marked this pin in green in the pinout diagram earlier.) From the circuitry on the die, this pin appears to be an output. If someone with a 386 chip hooks this pin to an oscilloscope, maybe they will see something interesting. ## Labeling the pads on the die The earlier 8086 processor, for example, is packaged in a DIP (Dual-Inline Package) with two rows of pins. This makes it straightforward to figure out which pin (and thus which function) is connected to each pad on the die. However, since the 386 has a two-dimensional grid of pins, the mapping to the pads is unclear. You can guess that pins are connected to a nearby pad, but ambiguity remains. Without knowing the function of each pad, I have a harder time reverse-engineering the die. In fact, my primary motivation for scanning the 386 package was to determine the pin-to-pad mapping and thus the function of each pad.8 Once I had the CT data, I was able to trace out each hidden connection between the pad and the external pin. The image below shows some of the labels; click here for the full, completely labeled image. As far as I know, this information hasn't been available outside Intel until now. A close-up of the 386 die showing the labels for some of the pins. ## Conclusions Intel's early processors were hampered by inferior packages, but by the time of the 386, Intel had realized the importance of packaging. In Intel's early days, management held the bizarre belief that chips should never have more than 16 pins, even though other companies used 40-pin packages. Thus, Intel's first microprocessor, the 4004 (1971), was crammed into a 16-pin package, limiting its performance. By 1972, larger memory chips forced Intel to move to 18-pin packages, extremely reluctantly.9 The eight-bit 8008 processor (1972) took advantage of this slightly larger package, but performance still suffered because signals were forced to share pins. Finally, Intel moved to the standard 40-pin package for the 8080 processor (1974), contributing to the chip's success. In the 1980s, pin-grid arrays became popular in the industry as chips required more and more pins. Intel used a ceramic pin grid array (PGA) with 68 pins for the 186 and 286 processors (1982), followed by the 132-pin package for the 386 (1985). The main drawback of the ceramic package was its cost. According to the 386 oral history, the cost of the 386 die decreased over time to the point where the chip's package cost as much as the die. To counteract this, Intel introduced a low-cost plastic package for the 386 that cost just a dollar to manufacture, the Plastic Quad Flat Package (PQFP) (details). In later Intel processors, the number of connections exponentially increased. A typical modern laptop processor uses a Ball Grid Array with 2049 solder balls; the chip is soldered directly onto the circuit board. Other Intel processors use a Land Grid Array (LGA): the chip has flat contacts called lands, while the _socket_ has the pins. Some Xeon processors have 7529 contacts, a remarkable growth from the 16 pins of the Intel 4004. From the outside, the 386's package looks like a plain chunk of ceramic. But the CT scan revealed surprising complexity inside, from numerous contacts for electroplating to six layers of wiring. Perhaps even more secrets lurk in the packages of modern processors. Follow me on Bluesky (@righto.com), Mastodon (@kenshirriff@oldbytes.space), or RSS. (I've given up on Twitter.) Thanks to Jon Bruner and Lumafield for scanning the chip. Lumafield's interactive CT scan of the 386 package is available here if you to want to examine it yourself. Lumafield also scanned a 1960s cordwood flip-flop and the Soviet Globus spacecraft navigation instrument for us. Thanks to John McMaster for taking 2D X-rays. ## Notes and references 1. I removed the metal lid with a chisel, as hot air failed to desolder the lid. A few pins were bent in the process, but I straightened them out, more or less. ↩ 2. The 386 package is described in "High Performance Technology, Circuits and Packaging for the 80386", Proceedings, ICCD Conference, Oct. 1986. (Also see Design and Test of the 80386 by Pat Gelsinger, former Intel CEO.) The paper gives the following requirements for the 386 package: 1. Large pin count to handle separate 32-bit data and address buses. 2. Thermal characteristics resulting in junction temperatures under 110°. 3. Power supply to the chip and I/O able to supply 600mA/ns with noise levels less than 0.4V (chip) and less than 0.8V (I/O). The first and second criteria motivated the selection of a 132-pin ceramic pin grid array (PGA). The custom six-layer package was designed to achieve the third objective. The power network is claimed to have an inductance of 4.5 nH per power pad on the device, compared to 12-14 nH for a standard package, about a factor of 3 better. The paper states that logic Vcc, logic Vss, I/O Vcc, and I/O Vss each have 10 pins assigned. Curiously, the datasheet states that the 386 has 20 Vcc pins and _21_ Vss pins, which doesn't add up. From my investigation, the "extra" pin is assigned to logic Vss, which has 11 pins. ↩ 3. I estimate that the 386 package contains roughly 0.16 grams of gold, currently worth about $16. It's hard to find out how much gold is in a processor since online numbers are all over the place. Many people recover the gold from chips, but the amount of gold one can recover depends on the process used. Moreover, people tend to keep accurate numbers to themselves so they can profit. But I made some estimates after searching around a bit. One person reports 9.69g of gold per kilogram of chips, and other sources seem roughly consistent. A ceramic 386 reportedly weighs 16g. This works out to 160 mg of gold per 386. ↩ 4. I don't have information on Intel's package manufacturing process specifically. This description is based on other descriptions of ceramic packages, so I don't guarantee that the details are correct for the 386. A Fujitsu patent, Package for enclosing semiconductor elements, describes in detail how ceramic packages for LSI chips are manufactured. IBM's process for ceramic multi-chip modules is described in Multi-Layer Ceramics Manufacturing, but it is probably less similar. ↩ 5. An IBM patent, Method for shorting pin grid array pins for plating, describes the prior art of electroplating pins with nickel and/or gold. In particular, it describes using leads to connect all input/output pins to a common bus at the edge of the package, leaving the long leads in the structure. This is exactly what I see in the 386 chip. The patent mentions that a drawback of this approach is that the leads can act as antennas and produce signal cross-talk. Fujitsu patent Package for enclosing semiconductor elements also describes wires that are exposed at side surfaces. This patent covers methods to avoid static electricity damage through these wires. (Picking up a 386 by the sides seems safe, but I guess there is a risk of static damage.) Note that each input/output pin requires a separate wire to the edge. However, the multiple pins for each power or ground plane are connected inside the package, so they do not require individual edge connections; one or two suffice. ↩ 6. To verify that the wires from pins to the edges of the chip exist and are exposed, I used a multimeter and found connectivity between pins and tiny spots on the sides of the chip. ↩ 7. To reduce costs, each die is tested while it is still part of the silicon wafer and each faulty die is marked with an ink spot. The wafer is "diced", cutting it apart into individual dies, and only the functional, unmarked dies are packaged, avoiding the cost of packaging a faulty die. Additional testing takes place after packaging, of course. ↩ 8. I tried several approaches to determine the mapping between pads and pins before using the CT scan. I tried to beep out the connections between the pins and the pads with a multimeter, but because the pads are so tiny, the process was difficult, error-prone, and caused damage to the package. I also looked at the pinout of the 386 in a plastic package (datasheet). Since the plastic package has the pins in a single ring around the border, the mapping to the die is straightforward. Unfortunately, the 386 die was slightly redesigned at this time, so some pads were moved around and new pins were added, such as `FLT#`. It turns out that the pinout for the plastic chip _almost_ matches the die I examined, but not quite. ↩ 9. In his oral history, Federico Faggin, a designer of the 4004, 8008, and Z80 processors, describes Intel's fixation on 16-pin packages. When a memory chip required 18 pins instead of 16, it was "like the sky had dropped from heaven. I never seen so [many] long faces at Intel, over this issue, because it was a religion in Intel; everything had to be 16 pins, in those days. It was a completely silly requirements [sic] to have 16 pins." At the time, other manufacturers were using 40- and 48-pin packages, so there was no technical limitation, just a minor cost saving from the smaller package. ↩

www.righto.com

August 17, 2025 at 5:01 AM

Ken Shirriff's blog

@righto.com.web.brid.gy

How to reverse engineer an analog chip: the TDA7000 FM radio receiver

www.righto.com

August 14, 2025 at 5:02 AM

Ken Shirriff's blog

@righto.com.web.brid.gy

Reverse engineering the mysterious Up-Data Link Test Set from Apollo

www.righto.com

August 1, 2025 at 4:54 AM

Ken Shirriff's blog

@righto.com.web.brid.gy

Inside the Apollo "8-Ball" FDAI (Flight Director / Attitude Indicator)

During the Apollo flights to the Moon, the astronauts observed the spacecraft's orientation on a special instrument called the FDAI (Flight Director / Attitude Indicator). This instrument showed the spacecraft's attitude—its orientation—by rotating a ball. This ball was nicknamed the "8-ball" because it was black (albeit only on one side). The instrument also acted as a flight director, using three yellow needles to indicate how the astronauts should maneuver the spacecraft. Three more pointers showed how fast the spacecraft was rotating. An Apollo FDAI (Flight Director/Attitude Indicator) with the case removed. This FDAI is on its side to avoid crushing the needles. Since the spacecraft rotates along three axes (roll, pitch, and yaw), the ball also rotates along three axes. It's not obvious how the ball can rotate to an arbitrary orientation while remaining attached. In this article, I look inside an FDAI from Apollo that was repurposed for a Space Shuttle simulator1 and explain how it operates. (Spoiler: the ball mechanism is firmly attached at the "equator" and rotates in two axes. What you see is two hollow shells around the ball mechanism that spin around the third axis.) ## The FDAI in Apollo For the missions to the Moon, the Lunar Module had two FDAIs, as shown below: one on the left for the Commander (Neil Armstrong in Apollo 11) and one on the right for the Lunar Module Pilot (Buzz Aldrin in Apollo 11). With their size and central positions, the FDAIs dominate the instrument panel, a sign of their importance. (The Command Module for Apollo also had two FDAIs, but with a different design; I won't discuss them here.2) The instrument panel in the Lunar Module. From Apollo 15 Lunar Module, NASA, S71-40761. If you're looking for the DSKY, it is in the bottom center, just out of the picture. Each Lunar Module FDAI could display inputs from multiple sources, selected by switches on the panel.3 The ball could display attitude from either the Inertial Measurement Unit or from the backup Abort Guidance System, selected by the "ATTITUDE MON" toggle switch next to either FDAI. The pitch attitude could also be supplied by an electromechanical unit called ORDEAL (Orbital Rate Display Earth And Lunar) that simulates a circular orbit. The error indications came from the Apollo Guidance Computer, the Abort Guidance System, the landing radar, or the rendezvous radar (controlled by the "RATE/ERROR MON" switches). The pitch, roll, and yaw rate displays were driven by the Rate Gyro Assembly (RGA). The rate indications were scaled by a switch below the FDAI, selecting 25°/sec or 5°/sec. ## The FDAI mechanism The ball inside the indicator shows rotation around three axes. I'll first explain these axes in the context of an aircraft, since the axes of a spacecraft are more arbitrary.4 The roll axis indicates the aircraft's angle if it rolls side-to-side along its axis of flight, raising one wing and lowering the other. Thus, the indicator shows the tilt of the horizon as the aircraft rolls. The pitch axis indicates the aircraft's angle if it pitches up or down, with the indicator showing the horizon moving down or up in response. Finally, the yaw axis indicates the compass direction that the aircraft is heading, changing as the aircraft turns left or right. (A typical aircraft attitude indicator omits yaw.) I'll illustrate how the FDAI rotates the ball in three axes, using an orange as an example. Imagine pinching the horizontal axis between two fingers with your arm extended. Rotating your arm will roll the ball counter-clockwise or clockwise (red arrow). In the FDAI, this rotation is accomplished by a motor turning the frame that holds the ball. For pitch, the ball rotates forward or backward around the horizontal axis (yellow arrow). The FDAI has a motor inside the ball to produce this rotation. Yaw is a bit more difficult to envision: imagine hemisphere-shaped shells attached to the top and bottom shafts. When a motor rotates these shells (green arrow), the hemispheres will rotate, even though the ball mechanism (the orange) remains stationary. A sphere, showing the three axes. The diagram below shows the mechanism inside the FDAI. The indicator uses three motors to move the ball. The roll motor is attached to the FDAI's frame, while the pitch and yaw motors are inside the ball. The roll motor rotates the roll gimbal through gears, causing the ball to rotate clockwise or counterclockwise. The roll gimbal is attached to the ball mechanism at two points along the "equator"; these two points define the pitch axis. Numerous wires on the roll gimbal enter the ball along the pitch axis. The roll control transformer provides position feedback, as will be explained below. The main components inside the FDAI. Removing the hemispherical shells reveals the mechanism inside the ball. When the roll gimbal is rotated, this mechanism rotates with it. The pitch motor causes the ball mechanism to rotate around the pitch axis. The yaw motor and control transformer are not visible in this photo; they are behind the pitch components, oriented perpendicularly. The yaw motor turns the vertical shaft, with the two hemisphere shells attached to the top and bottom of the shaft. Thus, the yaw motor rotates the ball shells around the yaw axis, while the mechanism itself remains stationary. The control transformers for pitch and yaw provide position feedback. The components inside the ball of the FDAI. Why doesn't the wiring get tangled up as the ball rotates? The solution is two sets of slip rings to implement the electrical connections. The photo below shows the first slip ring assembly, which handles rotation around the roll axis. These slip rings connect the stationary part of the FDAI to the rotating roll gimbal. The vertical metal brushes are stationary; there are 23 pairs of brushes, one for each connection to the ball mechanism. Each pair of brushes contacts one metal ring on the striped shaft, maintaining contact as the shaft rotates. Inside the shaft, 23 wires connect the circular metal contacts to the roll gimbal. The slip ring assembly in the FDAI. A second set of slip rings inside the ball handles rotation around the pitch axis. These rings provide the electrical connection between the wiring on the roll gimbal and the ball mechanism. The yaw axis does not use slip rings since only the hemisphere shells rotate around the yaw axis; no wires are involved. ## Synchros and the servo loop In this section, I'll explain how the FDAI is controlled by synchros and servo loops. In the 1950s and 1960s, the standard technique for transmitting a rotational signal electrically was through a synchro. Synchros were used for everything from rotating an instrument indicator in avionics to rotating the gun on a navy battleship. A synchro produces an output that depends on the shaft's rotational position, and transmits this output signal on three wires. If you connect these wires to a second synchro, you can use the first synchro to control the second one: the shaft of the second synchro will rotate to the same angle as the first shaft. Thus, synchros are a convenient way to send a control signal electrically. The photo below shows a typical synchro, with the input shaft on the top and five wires at the bottom: two for power and three for the output. A synchro transmitter. Internally, the synchro has a rotating winding called the rotor that is driven with 400 Hz AC. Three fixed stator windings provide the three AC output signals. As the shaft rotates, the voltages of the output signals change, indicating the angle. (A synchro resembles a transformer with three variable secondary windings.) If two connected synchros have different angles, the magnetic fields create a torque that rotates the shafts into alignment. The schematic symbol for a synchro transmitter or receiver. The downside of synchros is that they don't produce a lot of torque. The solution is to use a more powerful motor, controlled by the synchro and a feedback loop called a servo loop. The servo loop drives the motor in the appropriate direction to eliminate the error between the desired position and the current position. The diagram below shows how the servo loop is constructed from a combination of electronics and mechanical components. The goal is to rotate the output shaft to an angle that exactly matches the input angle, specified by the three synchro wires. The control transformer compares the input angle and the output shaft position, producing an error signal. The amplifier uses this error signal to drive the motor in the appropriate direction until the error signal drops to zero. To improve the dynamic response of the servo loop, the tachometer signal is used as a negative feedback voltage. The feedback slows the motor as the system gets closer to the right position, so the motor doesn't overshoot the position and oscillate. (This is sort of like a PID controller.) This diagram shows the structure of the servo loop, with a feedback loop ensuring that the rotation angle of the output shaft matches the input angle. A control transformer is similar to a synchro in appearance and construction, but the rotating shaft operates as an input, not the output. In a control transformer, the three stator windings receive the inputs and the rotor winding provides the error output. If the rotor angle of the synchro transmitter and control transformer are the same, the signals cancel out and there is no error voltage. But as the difference between the two shaft angles increases, the rotor winding produces an error signal. The phase of the error signal indicates the direction of the error. In the FDAI, the motor is a special motor/tachometer, a device that was often used in avionics servo loops. This motor is more complicated than a regular electric motor. The motor is powered by 115 volts AC at 400 hertz, but this won't spin the motor on its own. The motor also has two low-voltage control windings. Energizing the control windings with the proper phase causes the motor to spin in one direction or the other. The motor/tachometer unit also contains a tachometer to measure its speed for the feedback loop. The tachometer is driven by another 115-volt AC winding and generates a low-voltage AC signal that is proportional to the motor's rotational speed. A motor/tachometer similar (but not identical) to the one in the FDAI. The photo above shows a motor/tachometer with the rotor removed. The unit has many wires because of its multiple windings. The rotor has two drums. The drum on the left, with the spiral stripes, is for the motor. This drum is a "squirrel-cage rotor", which spins due to induced currents. (There are no electrical connections to the rotor; the drums interact with the windings through magnetic fields.) The drum on the right is the tachometer rotor; it induces a signal in the output winding proportional to the speed due to eddy currents. The tachometer signal is at 400 Hz like the driving signal, either in phase or 180º out of phase, depending on the direction of rotation. For more information on how a motor/tachometer works, see my teardown. ## The amplifiers The FDAI has three servo loops—one for each axis—and each servo loop has a separate control transformer, motor, and amplifier. The photo below shows one of the three amplifier boards. The construction is unusual and somewhat chaotic, with some components stacked on top of others to save space. Some of the component leads are long and protected with clear plastic sleeves.5 The cylindrical pulse transformer in the middle has five colorful wires coming out of it. At the left are the two transistors that drive the motor's control windings, with two capacitors between them. The transistors are mounted on a heat sink that is screwed down to the case of the amplifier assembly for cooling. Each amplifier is connected to the FDAI through seven wires with pins that plug into the sockets on the right of the board.6 One of the three amplifier boards. At the right front of the board, you can see a capacitor stacked on top of a resistor. The board is shiny because it is covered with conformal coating. The function of the board is to amplify the error signal so the motor rotates in the appropriate direction. The amplifier also uses the tachometer output from the motor unit to slow the motor as the error signal decreases, preventing overshoot. The inputs to the amplifier are 400 hertz AC signals, with the magnitude indicating the amount of error or speed and the phase indicating the direction. The two outputs from the amplifier drive the two control windings of the motor, determining which direction the motor rotates. The schematic for the amplifier board is below. 7 The two transistors on the left amplify the error and tachometer signals, driving the pulse transformer. The outputs of the pulse transformer will have opposite phases, driving the output transistors for opposite halves of the 400 Hz cycle. This activates the motor control winding, causing the motor to spin in the desired direction.8 The schematic of an amplifier board. ## History of the FDAI Bill Lear, born in 1902, was a prolific inventor with over 150 patents, creating everything from the 8-track tape to the Learjet, the iconic private plane of the 1960s. He created multiple companies in the 1920s as well as inventing one of the first car radios for Motorola before starting Lear Avionics, a company that specialized in aerospace instruments.9 Lear produced innovative aircraft instruments and flight control systems such as the F-5 automatic pilot, which received a trophy as the "greatest aviation achievement in America" for 1950. Bill Lear went on to solve an indicator problem for the Air Force: the supersonic F-102 Delta Dagger interceptor (1953) could climb at steep angles, but existing attitude indicators could not handle nearly vertical flight. Lear developed a remote two-gyro platform that drove the cockpit indicator while avoiding "gimbal lock" during vertical flight. For the experimental X-15 rocket-powered aircraft (1959), Lear improved this indicator to handle three axes: roll, pitch, and yaw. Meanwhile, the Siegler Corporation started in 1950 to manufacture space heaters for homes. A few years later, Siegler was acquired by John Brooks, an entrepreneur who was enthusiastic about acquisitions. In 1961, Lear Avionics became his latest acquisition, and the merged company was called Lear Siegler Incorporated, often known as LSI. (Older programmers may know Lear Siegler through the ADM-3A, an inexpensive video display terminal from 1976 that housed the display and keyboard in a stylish white case.) The X-15's attitude indicator became the basis of the indicator for the F-4 fighter plane (the ARU/11-A). Then, after "a minimum of modification", the attitude-director indicator was used in the Gemini space program. In total, Lear Siegler provided 11 instruments in the Gemini instrument panel, with the attitude director the most important. Next, Gemini's indicator was modified to become the FDAI (flight director-attitude indicator) in the Lunar Module for Apollo.10 Lear Siegler provided numerous components for the Apollo program, from a directional gyro for the Lunar Rover to the electroluminescent display for the Apollo Guidance Computer's Display/Keyboard (DSKY). An article titled "LSI Instruments Aid in Moon Landing" from LSI's internal LSI Log publication, July 1969. (Click for a larger version.) In 1974, Lear Siegler obtained a contract to develop the Attitude-Director Indicator (ADI) for the Space Shuttle, producing a dozen ADI units for the Space Shuttle. However, by this time, Lear Siegler was losing enthusiasm for low-volume space avionics. The Instrument Division president said that "the business that we were in was an engineering business and engineers love a challenge." However, manufacturing refused to deal with the special procedures required for space manufacturing, so the Shuttle units were built by the engineering department. Lear Siegler didn't bid on later Space Shuttle avionics and the Shuttle ADI became its last space product. In the early 2000s, the Space Shuttle's instruments were upgraded to a "glass cockpit" with 11 flat-panel displays known as the Multi-function Electronic Display System (MEDS). The MEDS was produced by Lear Siegler's long-term competitor, Honeywell. Getting back to Bill Lear, he wanted to manufacture aircraft, not just aircraft instruments, so he created the Learjet, the first mass-produced business jet. The first Learjet flew in 1963, with over 3000 eventually delivered. In the early 1970s, Lear designed a steam turbine automobile engine. Rather than water, the turbine used a secret fluorinated hydrocarbon called "Learium". Lear had visions of thousands of low-pollution "Learmobiles", but the engine failed to catch on. Lear had been on the verge of bankruptcy in the 1960s; one of his VPs explained that "the great creative minds can't be bothered with withholding taxes and investment credits and all this crap". But by the time of his death in 1978, Lear had a fortune estimated at $75 million. ## Comparing the ARU/11-A and the FDAI Looking inside our FDAI sheds more details on the evolution of Lear Siegler's attitude directors. The photo below compares the Apollo FDAI (top) to the earlier ARU/11-A used in the F-4 aircraft (bottom). While the basic mechanism and the electronic amplifiers are the same between the two indicators, there are also substantial changes. Comparison of an FDAI (top) with an ARU-11/A (bottom). The amplifier boards and needles have been removed from the FDAI. The biggest difference between the ARU-11/A indicator and the FDAI is that the electronics for the ARU-11/A are in a separate module that was plugged into the back of the indicator, while the FDAI includes the electronics internally, with boards mounted on the instrument frame. Specifically, the ARU-11/A has a separate unit containing a multi-winding transformer, a power supply board, and three amplifier boards (one for each axis), while the FDAI contains these components internally. The amplifier boards in the ARU-11/A and the FDAI are identical, constructed from germanium transistors rather than silicon.11 The unusual 11-pin transformers are also the same. However, the power supply boards are different, probably because the boards also contain scaling resistors that vary between the units.12 The power supply boards are also different shapes to fit the available space. The ball assemblies of the ARU/11-A and the FDAI are almost the same, with the same motor assemblies and slip ring mechanism. The gearing has minor changes. In particular, the FDAI has two plastic gears, while the ARU/11-A uses exclusively metal gears. The ARU/11-A has a patented pitch trim feature that was mostly—but not entirely—removed from the Apollo FDAI. The motivation for this feature is that an aircraft in level flight will be pitched up a few degrees, the "angle of attack". It is desirable for the attitude indicator to show the aircraft as horizontal, so a pitch trim knob allows the angle of attack to be canceled out on the display. The problem is that if you fly your fighter plane vertically, you want the indicator to show precisely vertical flight, rather than applying the pitch trim adjustment. The solution in the ARU-11/A is a special 8-zone potentiometer on the pitch axis that will apply the pitch trim adjustment in level flight but not in vertical flight, while providing a smooth transition between the regions. This special potentiometer is mounted inside the ball of the ARU-11/A. However, this pitch trim adjustment is meaningless for a spacecraft, so it is not implemented in the Apollo or Space Shuttle instruments. Surprisingly, the shell of the potentiometer still exists in our FDAI, but without the potentiometer itself or the wiring. Perhaps it remained to preserve the balance of the ball. In the photo below, the cylindrical potentiometer shell is indicated by an arrow. Note the holes in the front of the shell; in the ARU-11/A, the potentiometer's wiring terminals protrude through these holes, but in the FDAI, the holes are covered with tape. Inside the ball of the FDAI. The potentiometer shell is indicated with an arrow. Finally, the mounting of the ball hemispheres is slightly different. The ARU/11-A uses four screws at the pole of each hemisphere. Our FDAI, however, uses a single screw at each pole; the screw is tightened with a Bristol Key, causing the shaft to expand and hold the hemisphere in place. To summarize, the Apollo FDAI occupies a middle ground: while it isn't simply a repurposed ARU-11/A, neither is it a complete redesign. Instead, it preserves the old design where possible, while stripping out undesired features such as pitch trim. The separate amplifier and mechanical units of the ARU/11-A were combined to form the larger FDAI. ## Differences from Apollo The FDAI that we examined is a special unit: it was originally built for Apollo but was repurposed for a Space Shuttle simulator. Our FDAI is labeled Model 4068F, which is a Lunar Module part number. Moreover, the FDAI is internally stamped with the date "Apr. 22 1968", over a year before the first Moon landing. However, a closer look shows that several key components were modified to make the Apollo FDAI work in the Shuttle Simulator.14 The Apollo FDAI (and the Shuttle ADI) used resolvers as inputs to control the ball, while our FDAI uses synchros. (Resolvers and synchros are similar, except resolvers use sine and cosine inputs, 90° apart, on two wire pairs, while synchros use three inputs, 120° apart, on three wires.) NASA must have replaced the three resolver control transformers in the FDAI with synchro control transformers for use in the simulator. The Apollo FDAI used electroluminescent lighting for the display, while ours uses eight small incandescent bulbs. The metal case of our FDAI has a Dymo embossed tape label "INCANDESCENT LIGHTING", alerting users to the change from Apollo's illumination. Our FDAI also contains a step-down transformer to convert the 115 VAC input into 5 VAC to power the bulbs, while the Shuttle powered its ADI illumination directly from 5 volts. The dial of our FDAI was repainted to match the dial of the Shuttle FDAI. The Apollo FDAI had red bands on the left and right of the dial. A close examination of our dial shows that black paint was carefully applied over the red paint, but a few specks of red paint are still visible (below). Moreover, the edges of the lines and the lozenge show slight unevenness from the repainting. Second, the Apollo FDAI had the text "ROLL RATE", "PITCH RATE", and "YAW RATE" in white next to the needle scales. In our FDAI, this text has been hidden by black paint to match the Shuttle display.13 Third, the Apollo LM FDAI had a crosshair in the center of the instrument, while our FDAI has a white U-shaped indicator, the same as the Shuttle (and the Command Module's FDAI). Finally, the ball of the Apollo FDAI has red circular regions at the poles to warn of orientations that can cause gimbal lock. Our FDAI (like the Shuttle) does not have these circles. We couldn't see any evidence that these regions were repainted, so we suspect that our FDAI has Shuttle hemispheres on the ball. A closeup of the dial on our FDAI shows specks of red paint around the dial markings. The color is probably Switzer DayGlo Rocket Red. Our FDAI has also been modified electrically. Small green connectors (Micro-D MDB1) have been added between the slip rings and the motors, as well as on the gimbal arm. We think these connectors were added post-Apollo, since they are attached somewhat sloppily with glue and don't look flight-worthy. Perhaps these connectors were added to make disassembly and modification easier. Moreover, our FDAI has an elapsed time indicator, also mounted with glue. The back of our FDAI is completely different from Apollo. First, the connector's pinout is completely different. Second, each of the six indicator needles has a mechanical adjustment as well as a trimpot (details). Finally, each of the three axes has an adjustment potentiometer. ## The Shuttle's ADI (Attitude Director Indicator) Each Space Shuttle had three ADIs (Attitude Director Indicators), which were very similar to the Apollo FDAI, despite the name change. The photo below shows the two octagonal ADIs in the forward flight deck, one on the left in front of the Commander, and one on the right in front of the Pilot. The aft flight deck station had a third ADI.15 This photo shows Discovery's forward flight deck on STS-063 (1999). The ADIs are indicated with arrows. The photo is from the National Archives. Our FDAI appears to have been significantly modified for use in the Shuttle simulator, as described above. However, it is much closer to the Apollo FDAI than the ADI used in the Shuttle, as I'll show in this section. My hypothesis is that the simulator was built before the Shuttle's ADI was created, so the Apollo FDAI was pressed into service. The Shuttle's ADI was much more complicated electrically than the Apollo FDAI and our FDAI, providing improved functionality.16 For instance, while the Apollo FDAI had a simple "OFF" indicator flag to show that the indicator had lost power, the Shuttle's ADI had extensive error detection. It contained voltage level monitors to check its five power supplies. (The Shuttle ADI used three DC power sources and two AC power sources, compared to the single AC supply for Apollo.) The Shuttle's ADI also monitored the ball servos to detect position errors. Finally, it received an external "Data OK" signal. If a fault was detected by any of these monitors, the "OFF" flag was deployed to indicate that the ADI could not be trusted. The Shuttle's ADI had six needles, the same as Apollo, but the Shuttle used feedback to make the positions more accurate. Specifically, each Shuttle needle had a feedback sensor, a Linear Variable Differential Transformer (LVDT) that generates a voltage based on the needle position. The LVDT output drove a servo feedback loop to ensure that the needle was in the exact desired position. In the Apollo FDAI, on the other hand, the needle input voltage drove a galvanometer, swinging the needle proportionally, but there was no closed loop to ensure accuracy. I assume that the Shuttle's ADI had integrated circuit electronics to implement this new functionality, considerably more modern than the germanium transistors in the Apollo FDAI. The Shuttle probably used the same mechanical structures to rotate the ball, but I can't confirm that. ## Conclusions The FDAI was a critical instrument in Apollo, indicating the orientation of the spacecraft in three axes. It wasn't obvious to me how the "8-ball" can rotate in three axes while still being securely connected to the instrument. The trick is that most of the mechanism rotates in two axes, while hollow hemispherical shells provide the third rotational axis. The FDAI has an interesting evolutionary history, from the experimental X-15 rocket plane and the F-4 fighter to the Gemini, Apollo, and Space Shuttle flights. Our FDAI has an unusual position in this history: since it was modified from Apollo to function in a Space Shuttle simulator, it shows aspects of both Apollo and the Space Shuttle indicators. It would be interesting to compare the design of a Shuttle ADI to the Apollo FDAI, but I haven't been able to find interior photos of a Shuttle ADI (or of an unmodified Apollo FDAI).17 You can see a brief video of the FDAI in motion here. For more, follow me on Bluesky (@righto.com), Mastodon (@kenshirriff@oldbytes.space), or RSS. (I've given up on Twitter.) I worked on this project with CuriousMarc, Mike Stewart, and Eric Schlapfer, so expect a video at some point. Thanks to Richard for providing the FDAI. I wrote about the F-4 fighter plane's attitude indicator here. Inside the FDAI. The amplifier boards have been removed for this photo. ## Notes and references 1. There were many Space Shuttle simulators, so it is unclear which simulator was the source of our FDAI. The photo below shows a simulator, with one of the ADIs indicated with an arrow. Presumably, our FDAI became available when a simulator was upgraded from physical instruments to the screens of the Multi-function Electronic Display System (MEDS). "Forward flight deck of the fixed-base simulator." From Introduction to Shuttle Mission Simulation The most complex simulators were the three Shuttle Mission Simulators, one of which could dynamically move to provide motion cues. These simulators were at the simulation facility in Houston—officially the Jake Garn Mission Simulator and Training Facility—which also had a guidance and navigation simulator, a Spacelab simulator, and integration with the WETF (Weightless Environment Training Facility, an underground pool to simulate weightlessness). The simulators were controlled by a computer complex containing dozens of networked computers. The host computers were three UNIVAC 1100/92 mainframes, 36-bit computers that ran the simulation models. These were supported by seventeen Concurrent Computer Corporation 3260 and 3280 super-minicomputers that simulated tracking, telemetry, and communication. The simulators also used real Shuttle computers running the actual flight software; these were IBM AP101S General-Purpose Computers (GPC). For more information, see Introduction to Shuttle Mission Simulation. NASA had additional Shuttle training facilities beyond the Shuttle Mission Simulator. The Full Fuselage Trainer was a mockup of the complete Shuttle orbiter (minus the wings). It included full instrument panels (including the ADIs), but did not perform simulations. The Crew Compartment Trainers could be positioned horizontally or vertically (to simulate pre-launch operations). They contained accurate flight decks with non-functional instruments. Three Single System Trainers provided simpler mockups for astronauts to learn each system, both during normal operation and during malfunctions, before using the more complex Shuttle Mission Simulator. A list of Shuttle training facilities is in Table 3.1 of Preparing for the High Frontier. Following the end of the Shuttle program, the trainers were distributed to various museums (details). ↩ 2. The Command Module for Apollo used a completely different FDAI (flight director-attitude indicator) that was built by Honeywell. The two designs can be easily distinguished: the Honeywell FDAI is round, while the Lear Siegler FDAI is octagonal. ↩ 3. The FDAI's signals are more complicated than I described above. Among other things, the IMU's gimbal angles use a different coordinate system from the FDAI, so an electromechanical unit called GASTA (Gimbal Angle Sequence Transformation Assembly) used resolvers and motors to convert the coordinates. The digital attitude error signals from the computer are converted to analog by the Inertial Measurement Unit's Coupling Data Unit (IMU CDU). For attitude, the IMU is selected with the PGNS (Primary Guidance and Navigation System) switch setting. See the Lunar Module Systems Handbook, Lunar Module System Handbook Rev A, and the Apollo Operations Handbook for more. The connections to the Apollo FDAIs. Adapted from [LM-1 Systems Handbook. I think this diagram predates the ORDEAL system. (Click for a larger version.) ↩ 4. The roll, pitch, and yaw axes of the Lunar Module are not as obvious as the axes of an airplane. The diagram below defines these axes. The roll, pitch, and yaw axes of the Lunar Module. Adapted from LM Systems Handbook. ↩ 5. The amplifier is constructed on a single-sided printed circuit board. Since the components are packed tightly on the board, routing of the board was difficult. However, some of the components have long leads, protected by plastic sleeves. This provides additional flexibility for the board routing since the leads could be positioned as desired, regardless of the geometry of the component. As a result, the style of this board is very different from modern circuit boards, where components are usually arranged in an orderly pattern. ↩ 6. In our FDAI, the amplifier boards as well as the needle actuators are connected by pins that plug into sockets. These connections don't seem suitable for flight since they could easily vibrate loose. We suspect that the pin-and-socket connections made the module easier to reconfigure in the simulator, but were not used in flyable units. In particular, in the similar aircraft instruments (ARU/11-A) that we examined, the wires to the amplifier boards were soldered. ↩ 7. The board has a 56-volt Zener diode, but the function of the diode is unclear. The board is powered by 28 volts, not enough voltage to activate the Zener. Perhaps the diode filters high-voltage transients, but I don't see how transients could arise in that part of the circuit. (I can imagine transients when the pulse transformer switches, but the Zener isn't connected to the transformer.) ↩ 8. In more detail, each motor's control winding is a center-tapped winding, with the center connected to 28 volts DC. The amplifier board's output transistors will ground either side of the winding during alternate half-cycles of the 400 Hz cycle. This causes the motor to spin in one direction or the other. (Usually, control winding are driven 90° out of phase with the motor power, but I'm not sure how this phase shift is applied in the FDAI.) ↩ 9. The history of Bill Lear and Lear Siegler is based on Love him or hate him, Bill Lear was a creator and On Course to Tomorrow: A History of Lear Siegler Instrument Division’s Manned Spaceflight Systems 1958-1981. ↩ 10. Numerous variants of the Lear Siegler FDAI were built for Apollo, as shown before. Among other things, the length of the unit ("L MAX") varied from 8 inches to 11 inches. (Our FDAI is approximately 8 inches long.) The Apollo FDAI part number chart from Grumman Specification Control Drawing LSC350-301. (Click for a larger view.) ↩ 11. We examined a different ARU-11/A where the amplifier boards were not quite identical: the boards had one additional capacitor and some of the PCB traces were routed slightly differently. These boards were labeled "REV C" in the PCB copper, so they may have been later boards with a slight modification. ↩ 12. The amplifier scaling resistors were placed on the power supply board rather than the amplifier boards, which may seem strange. The advantage of this approach is that it permitted the three amplifier boards to be identical, since the components that differ between the axes were not part of the amplifier boards. This simplified the manufacture and repair of the amplifier boards. ↩ 13. On the front panel of our FDAI, the text "ROLL RATE", "PITCH RATE", and "YAW RATE" has been painted over. However, the text is still faintly visible (reversed) on the inside of the panel, as shown below. The inside of the FDAI's front cover. ↩ 14. The diagram below shows the internals of the Apollo LM FDAI at a high level. This diagram shows several differences between the LM FDAI and the FDAI that we examined. First, the roll, pitch, and yaw inputs to the LM FDAI are resolver inputs (i.e. sin and cos), rather than the synchro inputs to our FDAI. Second, the needle signals below are modulated on an 800 Hz carrier and are demodulated inside the FDAI. Our FDAI, however, uses positive or negative voltages to drive the needle galvanometers directly. A minor difference is that the diagram below shows the Power Off Flag wired to +28V internally, while our FDAI has the flag wired to connector pins, probably so the flag could be controlled by the simulator. The diagram of the FDAI in the LM Systems Handbook. Click for a larger image. ↩ 15. The Space Shuttle instruments were replaced with color LCD screens in the MEDS (Multifunction Electronic Display System) upgrade. This upgrade is discussed in New Displays for the Space Shuttle Cockpit. The Space Shuttle Systems Handbook shows the ADIs on the forward console (pages 263-264) and the aft console (page 275). The physical ADI is compared to the MEDS ADI display in Displays and Controls, Vol. 1 page 119. ↩ 16. The diagram below shows the internals of the Shuttle's ADI at a high level. The Shuttle's ADI is more complicated than the Apollo FDAI, even though they have the same indicator ball and needles. A diagram of the Space Shuttle's ADI. From Space Shuttle Systems Handbook Vol. 1, 1 G&C; DISP 1. (Click for a larger image.) ↩ 17. Multiple photos of the exterior of the Shuttle ADI are available here, from the National Air and Space Museum. There are interior photos of Apollo FDAIs online, but they all appear to be modified for Shuttle simulators. ↩

www.righto.com

June 22, 2025 at 4:27 AM

Ken Shirriff's blog

@righto.com.web.brid.gy

Reverse engineering the 386 processor's prefetch queue circuitry

www.righto.com

May 18, 2025 at 4:10 AM

Ken Shirriff's blog

@righto.com.web.brid.gy

The absurdly complicated circuitry for the 386 processor's registers

The groundbreaking Intel 386 processor (1985) was the first 32-bit processor in the x86 architecture. Like most processors, the 386 contains numerous registers; registers are a key part of a processor because they provide storage that is much faster than main memory. The register set of the 386 includes general-purpose registers, index registers, and segment selectors, as well as registers with special functions for memory management and operating system implementation. In this blog post, I look at the silicon die of the 386 and explain how the processor implements its main registers. It turns out that the circuitry that implements the 386's registers is much more complicated than one would expect. For the 30 registers that I examine, instead of using a standard circuit, the 386 uses _six_ different circuits, each one optimized for the particular characteristics of the register. For some registers, Intel squeezes register cells together to double the storage capacity. Other registers support accesses of 8, 16, or 32 bits at a time. Much of the register file is "triple-ported", allowing two registers to be read simultaneously while a value is written to a third register. Finally, I was surprised to find that registers don't store bits in order: the lower 16 bits of each register are interleaved, while the upper 16 bits are stored linearly. The photo below shows the 386's shiny fingernail-sized silicon die under a special metallurgical microscope. I've labeled the main functional blocks. For this post, the Data Unit in the lower left quadrant of the chip is the relevant component. It consists of the 32-bit arithmetic logic unit (ALU) along with the processor's main register bank (highlighted in red at the bottom). The circuitry, called the datapath, can be viewed as the heart of the processor. This die photo of the 386 shows the location of the registers. Click this image (or any other) for a larger version. The datapath is built with a regular structure: each register or ALU functional unit is a horizontal stripe of circuitry, forming the horizontal bands visible in the image. For the most part, this circuitry consists of a carefully optimized circuit copied 32 times, once for each bit of the processor. Each circuit for one bit is exactly the same width—60 µm—so the functional blocks can be stacked together like microscopic LEGO bricks. To link these circuits, metal bus lines run vertically through the datapath in groups of 32, allowing data to flow up and down through the blocks. Meanwhile, control lines run horizontally, enabling ALU operations or register reads and writes; the irregular circuitry on the right side of the Data Unit produces the signals for these control lines, activating the appropriate control lines for each instruction. The datapath is highly structured to maximize performance while minimizing its area on the die. Below, I'll look at how the registers are implemented according to this structure. ## The 386's registers A processor's registers are one of the most visible features of the processor architecture. The 386 processor contains 16 registers for use by application programmers, a small number by modern standards, but large enough for the time. The diagram below shows the eight 32-bit general-purpose registers. At the top are four registers called EAX, EBX, ECX, and EDX. Although these registers are 32-bit registers, they can also be treated as 16 or 8-bit registers for backward compatibility with earlier processors. For instance, the lower half of EAX can be accessed as the 16-bit register AX, while the bottom byte of EAX can be accessed as the 8-bit register AL. Moreover, bits 15-8 can also be accessed as an 8-bit register called AH. In other words, there are four different ways to access the EAX register, and similarly for the other three registers. As will be seen, these features complicate the implementation of the register set. The general purpose registers in the 386. From 80386 Programmer's Reference Manual, page 2-8. The bottom half of the diagram shows that the 32-bit EBP, ESI, EDI, and ESP registers can also be treated as 16-bit registers BP, SI, DI, and SP. Unlike the previous registers, these ones cannot be treated as 8-bit registers. The 386 also has six segment registers that define the start of memory segments; these are 16-bit registers. The 16 application registers are rounded out by the status flags and instruction pointer (EIP); they are viewed as 32-bit registers, but their implementation is more complicated. The 386 also has numerous registers for operating system programming, but I won't discuss them here, since they are likely in other parts of the chip.1 Finally, the 386 has numerous temporary registers that are not visible to the programmer but are used by the microcode to perform complex instructions. ## The 6T and 8T static RAM cells The 386's registers are implemented with static RAM cells, a circuit that can hold one bit. These cells are arranged into a grid to provide multiple registers. Static RAM can be contrasted with the dynamic RAM that computers use for their main memory: dynamic RAM holds each bit in a tiny capacitor, while static RAM uses a faster but larger and more complicated circuit. Since main memory holds gigabytes of data, it uses dynamic RAM to provide dense and inexpensive storage. But the tradeoffs are different for registers: the storage capacity is small, but speed is of the essence. Thus, registers use the static RAM circuit that I'll explain below. The concept behind a static RAM cell is to connect two inverters into a loop. If an inverter has a "0" as input, it will output a "1", and vice versa. Thus, the inverter loop will be stable, with one inverter on and one inverter off, and each inverter supporting the other. Depending on which inverter is on, the circuit stores a 0 or a 1, as shown below. Thus, the pair of inverters provides one bit of memory. Two inverters in a loop can store a 0 or a 1. To be useful, however, the inverter loop needs a way to store a bit into it, as well as a way to read out the stored bit. To write a new value into the circuit, two signals are fed in, forcing the inverters to the desired new values. One inverter receives the new bit value, while the other inverter receives the complemented bit value. This may seem like a brute-force way to update the bit, but it works. The trick is that the inverters in the cell are small and weak, while the input signals are higher current, able to overpower the inverters.2 These signals are fed in through wiring called "bitlines"; the bitlines can also be used to read the value stored in the cell. By adding two pass transistors to the circuit, the cell can be read and written. To control access to the register, the bitlines are connected to the inverters through pass transistors, which act as switches to control access to the inverter loop.3 When the pass transistors are on, the signals on the write lines can pass through to the inverters. But when the pass transistors are off, the inverters are isolated from the write lines. The pass transistors are turned on by a control signal, called a "wordline" since it controls access to a word of storage in the register. Since each inverter is constructed from two transistors, the circuit above consists of six transistors—thus this circuit is called a "6T" cell. The 6T cell uses the same bitlines for reading and writing, so you can't read and write to registers simultaneously. But adding two transistors creates an "8T" circuit that lets you read from one register and write to another register at the same time. (In technical terms, the register file is two-ported.) In the 8T schematic below, the two additional transistors (G and H) are used for reading. Transistor G buffers the cell's value; it turns on if the inverter output is high, pulling the read output bitline low.4 Transistor H is a pass transistor that blocks this signal until a read is performed on this register; it is controlled by a read wordline. Note that there are two bitlines for writing (as before) along with one bitline for reading. Schematic of a storage cell. Each transistor is labeled with a letter. To construct registers (or memory), a grid is constructed from these cells. Each row corresponds to a register, while each column corresponds to a bit position. The horizontal lines are the wordlines, selecting which word to access, while the vertical lines are the bitlines, passing bits in or out of the registers. For a write, the vertical bitlines provide the 32 bits (along with their complements). For a read, the vertical bitlines receive the 32 bits from the register. A wordline is activated to read or write the selected register. To summarize: each row is a register, data flows vertically, and control signals flow horizontally. Static memory cells (8T) organized into a grid. ## Six register circuits in the 386 The die photo below zooms in on the register circuitry in the lower left corner of the 386 processor. You can see the arrangement of storage cells into a grid, but note that the pattern changes from row to row. This circuitry implements 30 registers: 22 of the registers hold 32 bits, while the bottom ones are 16-bit registers. By studying the die, I determined that there are six different register circuits, which I've arbitrarily labeled (_a_) to (_f_). In this section, I'll describe these six types of registers. The 386's main register bank, at the bottom of the datapath. The numbers show how many bits of the register can be accessed. I'll start at the bottom with the simplest circuit: eight 16-bit registers that I'm calling type (_f_). You can see a "notch" on the left side of the register file because these registers are half the width of the other registers (16 bits versus 32 bits). These registers are implemented with the 8T circuit described earlier, making them dual ported: one register can be read while another register is written. As described earlier, three vertical bus lines pass through each bit: one bitline for reading and two bitlines (with opposite polarity) for writing. Each register has two control lines (wordlines): one to select a register for reading and another to select a register for writing. The photo below shows how four cells of type (_f_) are implemented on the chip. In this image, the chip's two metal layers have been removed along with most of the polysilicon wiring, showing the underlying silicon. The dark outlines indicate regions of doped silicon, while the stripes across the doped region correspond to transistor gates. I've labeled each transistor with a letter corresponding to the earlier schematic. Observe that the layout of the bottom half is a mirrored copy of the upper half, saving a bit of space. The left and right sides are approximately mirrored; the irregular shape allows separate read and wite wordlines to control the left and right halves without colliding. Four memory cells of type (_f_), separated by dotted lines. The small irregular squares are remnants of polysilicon that weren't fully removed. The 386's register file and datapath are designed with 60 µm of width assigned to each bit. However, the register circuit above is unusual: the image above is 60 µm wide but there are two register cells side-by-side. That is, the circuit crams _two_ bits in 60 µm of width, rather than one. Thus, this dense layout implements two registers per row (with interleaved bits), providing twice the density of the other register circuits. If you're curious to know how the transistors above are connected, the schematic below shows how the physical arrangement of the transistors above corresponds to two of the 8T memory cells described earlier. Since the 386 has two overlapping layers of metal, it is very hard to interpret a die photo with the metal layers. But see my earlier article if you want these photos. Schematic of two static cells in the 386, labeled "R" and "L" for "right" and "left". The schematic approximately matches the physical layout. Above the type (_f_) registers are 10 registers of type (_e_), occupying five rows of cells. These registers are the same 8T implementation as before, but these registers are 32 bits wide instead of 16. Thus, the register takes up the full width of the datapath, unlike the previous registers. As before, the double-density circuit implements two registers per row. The silicon layout is identical (apart from being 32 bits wide instead of 16), so I'm not including a photo. Above those registers are four (_d_) registers, which are more complex. They are triple-ported registers, so one register can be written while two other registers are read. (This is useful for ALU operations, for instance, since two values can be added and the result written back at the same time.) To support reading a second register, another vertical bus line is added for each bit. Each cell has two more transistors to connect the cell to the new bitline. Another wordline controls the additional read path. Since each cell has two more transistors, there are 10 transistors in total and the circuit is called 10T. Four cells of type (_d_). The striped green regions are the remnants of oxide layers that weren't completely removed, and can be ignored. The diagram above shows four memory cells of type (_d_). Each of these cells takes the full 60 µm of width, unlike the previous double-density cells. The cells are mirrored horizontally and vertically; this increases the density slightly since power lines can be shared between cells. I've labeled the transistors `A` through `H` as before, as well as the two additional transistors `I` and `J` for the second read line. The circuit is the same as before, except for the two additional transistors, but the silicon layout is significantly different. Each of the (_d_) registers has five control lines. Two control lines select a register for reading, connecting the register to one of the two vertical read buses. The three write lines allow parts of the register to be written independently: the top 16 bits, the next 8 bits, or the bottom 8 bits. This is required by the x86 architecture, where a 32-bit register such as EAX can also be accessed as the 16-bit AX register, the 8-bit AH register, or the 8-bit AL register. Note that reading part of a register doesn't require separate control lines: the register provides all 32 bits and the reading circuit can ignore the bits it doesn't want. Proceeding upward, the three (_c_) registers have a similar 10T implementation. These registers, however, do not support partial writes so all 32 bits must be written at once. As a result, these registers only require three control lines (two for reads and one for writes). With fewer control lines, the cells can be fit into less vertical space, so the layout is slightly more compact than the previous type (_d_) cells. The diagram below shows four type (_c_) rows above two type (_d_) rows. Although the cells have the same ten transistors, they have been shifted around somewhat. Four rows of type (_c_) above two cells of type (_d_). Next are the four (_b_) registers, which support 16-bit writes and 32-bit writes (but not 8-bit writes). Thus, these registers have four control lines (two for reads and two for writes). The cells take slightly more vertical space than the (_c_) cells due to the additional control line, but the layout is almost identical. Finally, the (_a_) register at the top has an unusual feature: it can receive a copy of the value in the register just below it. This value is copied directly between the registers, without using the read or write buses. This register has 3 control lines: one for read, one for write, and one for copying. A cell of type (_a_), which can copy the value in the cell of type (_b_) below. The diagram above shows a cell of type (_a_) above a cell of type (_b_). The cell of type (_a_) is based on the standard 8T circuit, but with six additional transistors to copy the value of the cell below. Specifically, two inverters buffer the output from cell (_b_), one inverter for each side of the cell. These inverters are implemented with transistors I1 through I4.5 Two transistors, S1 and S2, act as a pass-transistor switches between these inverters and the memory cell. When activated by the control line, the switch transistors allow the inverters to overwrite the memory cell with the contents of the cell below. Note that cell (_a_) takes considerably more vertical space because of the extra transistors. ## Speculation on the physical layout of the registers I haven't determined the mapping between the 386's registers and the 30 physical registers, but I can speculate. First, the 386 has four registers that can be accessed as 8, 16, or 32-bit registers: EAX, EBX, ECX, and EDX. These must map onto the (_d_) registers, which support these access patterns. The four index registers (ESP, EBP, ESI, and EDI) can be used as 32-bit registers or 16-bit registers, matching the four (_b_) registers with the same properties. Which one of these registers can be copied to the type (_a_) register? Maybe the stack pointer (ESP) is copied as part of interrupt handling. The register file has eight 16-bit registers, type (_f_). Since there are six 16-bit segment registers in the 386, I suspect the 16-bit registers are the segment registers and two additional registers. The LOADALL instruction gives some clues, suggesting that the two additional 16-bit registers are LDT (Local Descriptor Table register) and TR (Task Register). Moreover, `LOADALL` handles 10 temporary registers, matching the 10 registers of type (_e_) near the bottom of the register file. The three 32-bit registers of type (_c_) may be the CR0 control register and the DR6 and DR7 debug registers. The six 16-bit segment registers in the 386. In this article, I'm only looking at the main register file in the datapath. The 386 presumably has other registers scattered around the chip for various purposes. For instance, the Segment Descriptor Cache contains multiple registers similar to type (_e_), probably holding cache entries. The processor status flags and the instruction pointer (EIP) may not be implemented as discrete registers.6 To the right of the register file, a complicated block of circuitry uses seven-bit values to select registers. Two values select the registers (or constants) to read, while a third value selects the register to write. I'm currently analyzing this circuitry, which should provide more insight into how the physical registers are assigned. ## The shuffle network There's one additional complication in the register layout. As mentioned earlier, the bottom 16 bits of the main registers can be treated as two 8-bit registers.7 For example, the 8-bit AH and AL registers form the bottom 16 bits of the EAX register. I explained earlier how the registers use multiple write control lines to allow these different parts of the register to be updated separately. However, there is also a layout problem. To see the problem, suppose you perform an 8-bit ALU operation on the AH register, which is bits 15-8 of the EAX register. These bits must be shifted down to positions 7-0 so they can take part in the ALU operation, and then must be shifted back to positions 15-8 when stored into AH. On the other hand, if you perform an ALU operation on AL (bits 7-0 of EAX), the bits are already in position and don't need to be shifted. To support the shifting required for 8-bit register operations, the 386's register file physically interleaves the bits of the two lower bytes (but not the high bytes). As a result, bit 0 of AL is next to bit 0 of AH in the register file, and so forth. This allows multiplexers to easily select bits from AH or AL as needed. In other words, each bit of AH and AL is in almost the correct physical position, so an 8-bit shift is not required. (If the bits were in order, each multiplexer would need to be connected to bits that are separated by eight positions, requiring inconvenient wiring.)8 The shuffle network above the register file interleaves the bottom 16 bits. The photo above shows the shuffle network. Each bit has three bus lines associated with it: two for reads and one for writes, and these all get shuffled. On the left, the lines for the 16 bits pass straight through. On the right, though, the two bytes are interleaved. This shuffle network is located below the ALU and above the register file, so data words are shuffled when stored in the register file and then unshuffled when read from the register file.9 In the photo, the lines on the left aren't quite straight. The reason is that the circuitry above is narrower than the circuitry below. For the most part, each functional block in the datapath is constructed with the same width (60 µm) for each bit. This makes the layout simpler since functional blocks can be stacked on top of each other and the vertical bus wiring can pass straight through. However, the circuitry above the registers (for the barrel shifter) is about 10% narrower (54.5 µm), so the wiring needs to squeeze in and then expand back out.10 There's a tradeoff of requiring more space for this wiring versus the space saved by making the barrel shifter narrower and Intel must have considered the tradeoff worthwhile. (My hypothesis is that since the shuffle network required additional wiring to shuffle the bits, it didn't take up more space to squeeze the wiring together at the same time.) ## Conclusions If you look in a book on processor design, you'll find a description of how registers can be created from static memory cells. However, the 386 illustrates that the implementation in a real processor is considerably more complicated. Instead of using one circuit, Intel used six different circuits for the registers in the 386. The 386's register circuitry also shows the curse of backward compatibility. The x86 architecture supports 8-bit register accesses for compatibility with processors dating back to 1971. This compatibility requires additional circuitry such as the shuffle network and interleaved registers. Looking at the circuitry of x86 processors makes me appreciate some of the advantages of RISC processors, which avoid much of the ad hoc circuitry of x86 processors. If you want more information about how the 386's memory cells were implemented, I wrote a lower-level article earlier. I plan to write more about the 386, so follow me on Bluesky (@righto.com) or RSS for updates. ## Footnotes and references 1. The 386 has multiple registers that are only relevant to operating systems programmers (see Chapter 4 of the 386 Programmer's Reference Manual). These include the Global Descriptor Table Register (GDTR), Local Descriptor Table Register (LDTR), Interrupt Descriptor Table Register (IDTR), and Task Register (TR). There are four Control Registers CR0-CR3; CR0 controls coprocessor usage, paging, and a few other things. The six Debug Registers for hardware breakpoints are named DR0-DR3, DR6, and DR7. The two Test Registers for TLB testing are named TR6 and TR7. I expect that these registers are in the 386's Segment Unit and Paging Unit, rather than part of the processing datapath. ↩ 2. Typically the write driver circuit generates a strong low on one of the bitlines, flipping the corresponding inverter to a high output. As soon as one inverter flips, it will force the other inverter into the right state. To support this, the pullup transistors in the inverters are weaker than normal. ↩ 3. The pass transistor passes its signal through or blocks it. In CMOS, this is usually implemented with a transmission gate with an NMOS and a PMOS transistor in parallel. The cell uses only the NMOS transistor, which is much worse at passing a high signal than a low signal. Because there is one NMOS pass transistor on each side of the inverters, one of the transistors will be passing a low signal that will flip the state. ↩ 4. The bitline is typically precharged to a high level for a read, and then the cell pulls the line low for a 0. This is more compact than including circuitry in each cell to pull the line high. ↩ 5. Note that buffering is needed so the (_b_) cell can write to the (_a_) cell. If the cells were connected directly, cell (_a_) could overwrite cell (_b_) as easily as cell (_b_) could overwrite cell (_a_). With the inverters in between, cell (_b_) won't be affected by cell (_a_). ↩ 6. In the 8086, the processor status flags are not stored as a physical register, but instead consist of flip-flops scattered throughout the chip (details). The 386 probably has a similar implementation for the flags. In the 8086, the program counter (instruction pointer) does not exist as such. Instead, the instruction prefetch circuitry has a register holding the current prefetch address. If the program counter address is required (to push a return address or to perform a relative branch, for instance), the program counter value is derived from the prefetch address. If the 386 is similar, the program counter won't have a physical register in the register file. ↩ 7. The x86 architecture combines two 8-bit registers to form a 16-bit register for historical reasons. The TTL-based Datapoint 2200 (1971) system had 8-bit A, B, C, D, E, H, and L registers, with the H and L registers combined to form a 16-bit indexing register for memory accesses. Intel created a microprocessor version of the Datapoint 2200's architecture, called the 8008. Intel's 8080 processor extended the register pairs so BC and DE could also be used as 16-bit registers. The 8086 kept this register design, but changed the 16-bit register names to AX, BX, CX, and DX, with the 8-bit parts called AH, AL, and so forth. Thus, the unusual physical structure of the 386's register file is due to compatibility with a programmable terminal from 1971. ↩ 8. To support 8-bit and 16-bit operations, the 8086 processor used a similar interleaving scheme with the two 8-bit halves of a register interleaved. Since the 8086 was a 16-bit processor, though, its interleaving was simpler than the 32-bit 386. Specifically, the 8086 didn't have the upper 16 bits to deal with. ↩ 9. The 386's constant ROM is located below the shuffle network. Thus, constants are stored with the bits interleaved in order to produce the right results. (This made the ROM contents incomprehensible until I figured out the shuffling pattern, but that's a topic for another article.) ↩ 10. The main body of the datapath (ALU, etc.) has the same 60 µm cell width as the register file. However, the datapath is slightly wider than the register file overall. The reason? The datapath has a small amount of circuitry between bits 7 and 8 and between bits 15 and 16, in order to handle 8-bit and 16-bit operations. As a result, the logical structure of the registers is visible as stripes in the physical layout of the ALU below. (These stripes are also visible in the die photo at the beginning of this article.) Part of the ALU circuitry, displayed underneath the structure of the EAX register. ↩

www.righto.com

May 15, 2025 at 4:07 AM

Ken Shirriff's blog

@righto.com.web.brid.gy

A tricky Commodore PET repair: tracking down 6 1/2 bad chips

www.righto.com

April 25, 2025 at 3:51 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news