The Secrets of System Management Mode

By Robert R. Collins


In the January 1997 "Undocumented Corner," I presented a brief prehistory and overview of System Management Mode (SMM), and made a comparison between 80386 ICE mode and Pentium’s SMM. As demonstrated in that column, there are many similarities and many differences between ICE mode and SMM. Even though Intel did document SMM in its Pentium manuals, it skipped a few things – the secrets of System Management Mode. This column will disclose some of those secrets. Specifically, I will discuss the state save map, show how the AutoHALT feature works, explain the I/O Restart feature (and how it is capable of restarting a string I/O operation from the beginning), and discuss interrupt servicing within SMM. Keep in mind that the information presented in this column is highly implementation dependent. Intel offers no guarantee that this behavior will exist in any future processors, or future steppings of the same processor. Therefore, it would be inappropriate to use any of these secrets in production code.
The Secrets of the State Save Map

As mentioned in my previous column, SMM’s state save map is nearly identical to the memory image used by the undocumented LOADALL instruction (see http://www.rcollins.org/articles/loadall for a description of the LOADALL instruction). In the Pentium Processor Family Developer’s Manual, Volume 3 (Intel part number 241430), Table 20-2 presents the SMRAM State Save Map. Many parts of this map are designated "Reserved." You are warned not to modify any of these location lest unpredictable microprocessor behavior may result. Often times, those words are Intel technobabble that means "the microprocessor behavior is fully defined – we just don’t want to tell you what it is." This case appears to be no different.

Table 1 shows the entire SMM State Save Map, including all of the undocumented location. The undocumented locations can be subdivided into four categories: undocumented registers; descriptor cache; I/O restart; and unwritten. The unwritten category is self explanatory, as these locations are never written by the Pentium processor.

Undocumented Registers

The only undocumented registers are CR4 in location 7F28, RSM Control in 7F26, and the Alternate DR6 register in location 7F24. The CR4 register contains control bits that enable many Pentium features, such as Virtual Mode Extensions, Protected Virtual Interrupts, and 4 MB Pages (See http://www.rcollins.org/articles/vme1, http://www.rcollins.org/articles/pvi1, and DDJ May 1996, or http://www.rcollins.org/ddj/May96 for a description of these features). If CR4 were not stored in the state save map, it would be impossible to restore the Pentium processor to all of its operating environments.

The Alternate DR6 register is controlled by the RSM Control register. When RSM_CTL[bit0] = 1, the least significant word of DR6 (the lower 16 bits) is loaded from the ALT_DR6 slot, instead of the normal DR6 slot at 7FCC. It appears that the remaining 15 bits in RSM Control serve no other measurable purpose. I don’t know why this alternate version of DR6 exists, and would strongly recommend that nobody rely on it existing in future versions of SMM, or even different steppings of the Pentium processor.

Descriptor Cache Slots

The descriptor caches contain the microprocessor’s internal form of each segment register (DS, CS, and so on) and the system registers (GDT, IDT, LDT, and TR). The registers that we normally call segment registers (CS, DS, and so on), are merely user-visible registers that don’t have any real effect on the internal operations of the microprocessor. Whether in real mode, protected mode, or virtual 8086 mode, all microprocessor operations that use segments are controlled by the values in the descriptor cache registers – not the user-visible segment registers. Each time a segment register or system register is loaded, the microprocessor loads the appropriate descriptor cache register. Each descriptor cache is composed of three fields: a base address, a limit, and segment access rights. (Example 1 shows the format of the descriptor cache registers.) Some of these fields are read from descriptor tables (for example, when loading a segment register in protected mode), some are calculated (for instance, the segment register base address in real mode), some are ignored (for example, the segment access rights are not changed in real mode), and others are given hard-coded values (for example, the access rights when loading GDT, and IDT).

Whenever a field in the descriptor cache register is modified, it has an immediate effect on microprocessor operations. However, there are only two ways to modify an individual field in the descriptor cache registers. The first method is to modify any of the descriptor cache slots in SMM’s state save map. Upon execution of the RSM instruction, the new values have an immediate effect. The second method requires using an in-circuit emulator (ICE) to modify the fields.

Beware modifying the descriptor cache contents to illegal values, such as values that would be impossible to achieve through any programmatic means. (See the article at http://www.rcollins.org/Productivity/DescriptorCache.html for a detailed description of the descriptor cache contents and source code examples of changing the various fields to illegal values.) For example, you can modify the segment limit to 0xFFFEFF – a value that can’t be programmed by any other method. The CS access rights may be changed to read/writeable for protected mode.

AutoHALT

The AutoHALT Restart feature of SMM is intended to give the systems designer the choice of whether or not to return to a HALT state after the return from SMM. When the microprocessor is in the halt state upon entrance to SMM, a flag is set in the AutoHALT field of the state save map (offset 0x7F02). When AutoHALT[bit0]=1, SMM was entered from the HALT state. If this hit is cleared upon exit (AutoHALT[bit0]=0), the microprocessor will continue execution at the instruction following the HLT instruction. (See Table 2 for a list of possible entry an exit values for the AutoHALT.) In general, the AutoHALT field directs the microprocessor whether or not to restart the HLT instruction upon exit of SMM. This is accomplished by decrementing EIP and executing whatever instruction resides at that position. AutoHALT restart behavior is consistent, regardless of whether or not EIP-1 contains a HLT instruction. If the SMM handler set Auto HALT[bit0]=1 when the interrupted instruction was not a HLT instruction (AutoHALT[bit0]= 0 upon entrance), they would run the risk of resuming execution at an undesired location. The RSM microcode doesn’t know the length of the interrupted instruction. Therefore when AutoHALT[bit0]=1 upon exit, the RSM microcode blindly decrements the EIP register by 1 and resumes execution. This explains why Intel warns that unpredictable behavior may result from setting this field to restart a HLT instruction when the microprocessor wasn’t in a HALT state upon entrance. Listing One presents an algorithm that describes the AutoHALT Restart feature.

I/O Restart There are few reserved fields in the state save map that provide support for the I/O Restart feature. The I/O Restart feature is intended to restart an I/O operation, such as OUT and IN instructions. But this task isn’t as easy as it sounds. The OUT and IN instructions are single-byte instructions. However, when the source or destination register is a 32-bit register, a size-override prefix is prepended to the normal opcode to create a two-byte opcode. Next, consider a string operation like REP OUTS BYTE PTR CS:[SI]. In this case, there is a repeat prefix (REP) and a CS override, thus adding two bytes to the single-byte opcode. The extra opcode byte(s) substantially complicate the restart process – the instruction pointer can’t be decremented by a fixed value, as in the case of the AutoHALT Restart feature.

To overcome the variable length opcode problem, Intel added four fields to the state save map to aid in restarting I/O operations. Whenever an I/O instruction is executed, the Pentium stores the values of ECX, ESI, EDI, and EIP in temporary (internal) registers. These temp registers seem to retain their contents even when hundreds or even thousands of other instructions precede the entrance to SMM. Once an SMI# is triggered, the Pentium stores the contents of these temp registers to slots reserved for their use. ECX, ESI, EDI; and EIP are stored in the state save map slots at locations 7F08, 7F0C, 7F04, and 7F10 respectively. After completion of the SMM handler, the RSM instruction doesn’t know whether or not to restart an I/O operation without being told to do so. This is the purpose of the I/O Restart fields the state save map. When any bit in the IORestart field is set, the RSM microcode uses these undocumented fields as the restoration values for ECX, ESI, EDI, and EIP. For string operations that use the REP prefix, the operation is restarted from the very beginning using the initial values of ECX, ESI, and EDI. Listing Two shows how the I/O Restart operation behaves.

NMI or INIT from within SMM

Upon entrance to SMM, interrupts are disabled (EFLAGS.IF= 0) and both NMI and INIT are disabled. The IDT register has not been changed, and retains whatever value it had before SMM entrance. Before servicing any interrupts, it is necessary to load your own interrupt vectors, and most likely reload the IDT register with a new value. Once you issue the STI instruction, you’re ready to begin servicing interrupts.. However, two asynchronous interrupts pins remained disabled: NMI and INIT.

One occasion, I needed to write an SMM handler that was capable of servicing non-maskable interrupts (NMIs). I read the appropriate Intel manuals and wrote my SMM and NMI handlers in accordance with their (ambiguous) recommendations. After writing the NMI handler, I decided to test it by generating an NMI from within SMM. Much to my surprise, the NMI hander was never called until I returned from SMM. Either the Pentium manuals were wrong, or my interpretation of them was wrong. The Pentium Processor Family Developer’s Manual, Volume 3 describes NMI recognition within SMM in the following manner:

Although NMI requests are blocked when the CPU enters SMM, they may be enabled through software by invoking a dummy interrupt and vectoring to an Interrupt Service Routine. NMI interrupt requests will be recognized once the Interrupt Service Routine has begun executing.

This statement is highly ambiguous, and is open to at least three interpretations. However, the Pentium Processor Specification Update P54C erratum #14, (Intel part number 242480) is more much specific:

Normally the processor would ignore NMI or INIT while in SMM, except after an IRET instruction.

This is exactly what I had done, but it didn’t work. Therefore, I decided to set up some tests to determine the exact circumstances where NMI is unmasked within SMM. After collecting my results, I found that the Pentium documentation is completely wrong, as NMI isn’t unmasked under any of the circumstances described therein. Therefore, I contacted Intel’s technical support department for further clarification (I didn’t tell them that I already knew the answer). I asked for a specific example describing how to unmask NMI from within SMM, and got the following response:

Since NMI is tied to critical events like power-down and is also unrecognized while in SMM, the SMM handler can "poll" for NMI events by performing the dummy interrupt. Just write an ISR that is empty (you might need a couple of NOPs) except for the IRET; call it by doing a soft INT; while in this "ISR" any latched NMIs will be recognized.

The problem, as I saw it, was that everybody was wrong. The Pentium documentation was wrong; the Pentium errata (P54c erratum #14), which normally provides very accurate workarounds for specific anomalies, was wrong; and now Intel’s tech support was wrong. All sources were giving consistent solutions to this problem – but the solution didn’t even remotely match the behavior I had observed. My discoveries showed that most interrupt conditions don’t unmask NMI and INIT; but I found a few cases that do. Table 3 is a list of conditions which do not unmask NMI and INIT during SMM. Table 4 is a list of conditions which do unmask NMI and INIT. As you can see from these tables, unmasking NMI and INIT from within SMM doesn’t behave as documented, nor appear to have any consistent methodology. For example, if BOUND (exception taken) unmasked NMI/INIT, and BOUND (exception not taken) didn’t, why didn’t INTO (exception taken) unmask NMI/INIT also? Most disappointingly, the most obvious examples of a "dummy interrupt routine" failed to provide the documented behavior.

Conclusion

If you’re planning to write your own SMM handler, hopefully you’ve learned something in this column that will give you insights while writing your own code. I wouldn’t rely on any Pentium-specific behavior. On the contrary, I would stay clear of any implementation-specific usage. Instead, I would learn from the undocumented behavior and apply that knowledge to debugging efforts. A good grasp of Intel processor internals can substantially increase productivity – if nothing else. In my next column, I’ll continue my SMM discussion by discussing the many caveats of SMM. These caveats are things that every SMM programmer should know before beginning to code; in doing so, you might save many hours debugging code that appears to be written perfectly.

Listing 3 - Logic Analyzer Trace of SMM/GDT Shutdown

8 Oct 1996 06:26                                        DAS 92A96SD-1 Disasm
GDT Shutdown after RSM                                                Page 1

Sequence Address  Data       Mnemonic                              Timestamp
----------------------------------------------------------------------------
       0 00038048 665A66EF   OUT DX,AX             SMM(16) 
         00038049 665A66EF   POP EDX               SMM(16)
         0003804B 665A66EF   POP EAX               SMM(16)
         0003804D 00AA0F58   RSM                   SMM(16)
     […] ; GDT Descriptor Cache from the SMM State Save Map
      32 0003FF88 00015BA0   ( MEM READ )          SMM                240 ns
      33 0003FF8C 00000002   ( MEM READ )          SMM                230 ns
      34 0003FF84 0000001F   ( MEM READ )          SMM                230 ns
     […]
      60 00016240 100135EA JMPL 0010:0135             (16)            500 ns
      […]
      64 00015C48 0010017F ( SEGMENT OVERRUN ) (13)                   720 ns
         00015C4C 00008600 ( SEGMENT OVERRUN ) (13)
      65 00015C20 0010017A ( DOUBLE FAULT ) (8)                       740 ns
         00015C24 00008600 ( DOUBLE FAULT ) (8)
      66 00000000 ------4F ( SHUTDOWN )                               490 ns
Table 1 - SMM State Save Map
Register Offset Description
7FFC CR0
7FF8 CR3
7444 EFLAGS
7FF0 EIP
7FEC EDI
7FE8 ESI
7FE4 EBP
7FE0 ESP
7FDC EBX
7FD8 EDX
7FD4 ECX
7FD0 EAX
7FFC DR6
7FC8 DR7
7FC4 TR Selector
7FC0 LDT Secector
7FBC GS Selector/Segment Register
7FB8 FS Selector/Segment Register
7FB4 DS Selector/Segment Register
7FB0 SS Selector/Segment Register
7FAC CS Selector/Segment Register
7FA8 ES Selector/Segment Register
7FA7-4F9C TSS Descriptor Cache (Undocumented)
7F9B-7F90 IDT Descriptor Cache (Undocumented)
7F8F-7F84 GDT Descriptor Cache (Undocumented)
7F83-7F78 LDT Descriptor Cache (Undocumented)
7F77-7F6C GS Descriptor Cache (Undocumented)
7F6B-7F60 FS Descriptor Cache (Undocumented)
7F5F-7F54 DS Descriptor Cache (Undocumented)
7F53-7F48 SS Descriptor Cache (Undocumented)
7F47-7F3C CS Descriptor Cache (Undocumented)
7F3B-7F30 ES Descriptor Cache (Undocumented)
7F2C Never written (Undocumented)
7F28 CR4 (Undocumented)
7F26 RSM Control (Undocumented)
7F24 Alternate DR6 (Undocumented)
7F23-7F14 Never written (Undocumented)
7F10 I/O Restart EIP (Undocumented)
7F0C I/O Restart ESI (Undocumented)
7F08 I/O Restart ECX (Undocumented)
7F04 I/O Restart EDI / CR0 (Undocumented)
7F02 AutoHALT Restart Flag
7F00 I/O Restart Flag
7EFC SMM Revision Identifier
7EF8 SMM Base
7EF7-7E00 Never Written (Undocumented)

Example 1 - Descriptor Cache Structure

Desc_cache STRUC
    _Limit  dd ?
    _Addr   dd ?
    _Type   dd ?
Desc_cache ENDS

Listing 1 - AutoHALT Restart Operation

If (AutoHALT & 0xFFFF) {
    EIP = EIP - 1;
    return;
}

Listing 2 - I/O Restart Operation

if (TR12 & 0x200) && (IORestart & 0xFF)) {
    EDI = REP_EDI;
    ECX = REP_ECX;
    ESI = REP_ESI;
    EIP = IORestart_EIP;
    return;
}
Table 2 - AutoHALT Restart Feature
Value of AutoHalt[b0] at entry Value of AutoHALT[b0] at exit Description
0 0 Normal entry. Resume to next instruction.
0 1 Normal entry. Unpredictable behavior upon exit (so says Iintel).
1 0 Halt state upon entry. Return to instruction following HLT instruction.
1 1 Halt state upon entry. Return to HLT instruction.


Table 3 - Interrupts that do not unmask NMI and INIT
Interrupt Description
IRETD, IRET Chosen because Intel says they unmask NMI & INIT. Observed behavior contradicts Intel documentation (behavior mentioned in erratum 14 of P54C).
INT 01 (CD 01) Chosen because it is a software version of the single-step exception.
INT 03 (CD 03) Chosen because it is a software version of the breakpoint exception.
INT 04 (CD 04) Chosen because it is a software version of the overflow exception.
INT 05 (CD 05) Chosen because it is a software version of the bound exception.
INT 06 (CD 06) Chosen because it is a software version of the invalid opcode exception.
INT 07 (CD 07) Chosen because it is a software version of the device unavailable exception.
INT 32 (CD 20) Chosen because it is not a software version of any processor exception.
INTO (not taken) Chosen because it is a software-generated exception. This exception is not subject to the interrupt bitmap of Ev86 mode, and obviously was categorized by Intel as different from other software interrupt (CD nn) instructions. This behavior was predictable, as no interrupt was taken.
INTO (taken) Chosen as mentioned above. This behavior was not as predicted, and contradicted other observed results.
BOUND (not taken) Chosen because it is a software-generated exception.


Table 4 - Interrupts which do unmask NMI and INIT
Test Interrupt Description
Test 2 DIV-0 DIV-0 exception unmasks NMI & INIT.
Test 3 Debug exception Debug exception (debug register breakpoint) unmasks NMI & INIT.
Test 4 ICEBP The ICEBP instruction unmasks NMI & INIT. This instruction was chosen because it is a software exception not subject to the interrupt bitmap of Ev86 mode.
Test 5 BOUND (taken) The BOUND exception (taken) unmasks NMI & INIT. This condition was chosen because it is a software exception not subject to the interrupt bitmap of Ev86 mode.
Test 6 Invalid opcode An undefined opcode exception unmasks NMI & INIT. This condition was chosen to verify that processor-generated exception will unmask NMI & INIT.
Test 7 Hardware interrupts (INTR) INTR interrupts unmask NMI & INIT.


Playing games with the access rights of the GDT

On one occasion, I set the GDT access rights to Not Present. The GDT and IDT access rights don’t exist, in theory. However, in the state save map you will discover that both descriptors have their access rights set to 0x82 – a Present, LDT type, System Descriptor. This begs the question: What happens when you change the GDT access rights? Does the Pentium ignore these access rights, or does it actually use them? To answer this question, I wrote an SMM handler that would set the GDT access rights to Not Present before executing the RSM instruction. After executing RSM, I performed a far transfer which would cause the GDT to be accessed. If the access rights were ignored when the GDT was accessed, the program would continue to operate normally. However, if the access rights were actually used, you would expect that the GDT access would cause a Not Present Fault, and eventually lead to a microprocessor shutdown.

During any far control transfer, the code segment selector would need to be authenticated as a valid code segment. The validation process requires the microprocessor to read from the GDT. If the GDT access rights had any meaning, this GDT access would cause the requisite Not Present (#NP) fault. The #NP Fault would attempt to invoke the #NP handler specified in the IDT.

The code selector for the #NP handler would need GDT validation. The GDT is still not present, and this second condition causes the microprocessor to generate a Double Fault. The Double Fault couldn’t be executed for the same reason. This third exception condition would trigger a triple fault; all triple faults generate a SHUTDOWN cycle (see Pentium Processor Family Developer’s Manual, Volume 3, section 14.9.8, or http://www.rcollins.org/Productivity/TripleFault.html for a description and source code example of a Triple Fault). The SHUTDOWN cycle could be read from the logic analyzer. My test confirmed that the Pentium does enter a shutdown state as I expected, but not as the result of a #NP exception. Instead, I found that the first exception generated was a general-protection fault, followed by a double fault, and finally a SHUTDOWN cycle. Listing Three shows a logic analyzer trace of the Pentium behavior after modifying the GDT access rights. You can see from this listing the General Protection Fault, followed by the Double Fault, then the Shutdown.


Back to Dr. Dobb's Undocumented Corner home page