The LOADALL Instruction

by

Robert Collins


Of the few undocumented instructions in the 80286 and 80386 microprocessors, the LOADALL instruction is the most widely known. Nevertheless, very few people understand how to use it. Using LOADALL is not as simp)e as merely knowing the LOADALL opcode and its format, because knowing how to use LOADALL requires a knowledge of many aspects of the CPUs' behavior that are not documented in their respective data sheets.

The 286 LOADALL is widely known because a 15-page lntel confidential document describing its use was given to many developers. 286 LOADALL is so commonly used in production code that DOS 3.3 (and above) and OS/2 have provisions for using LOADALL built in them. Every 386 and 486 BIOS emulates 286 LOADALL and even Microsoft CODEVIEW recognizes the 286 LOADALL opcode and disassembles it.

On the other hand, the 386 LOADALL is not widely known, and very few developers even know it exists. In this article, I will explain how to use both the 286 and 386 LOADALL instructions and present source code to demonstrate the various aspects of CPU behavior that become apparent, or can be proven, when using LOADALL.

Intel originally included LOADALL in the CPU mask for testing purposes and In Circuit Emulator (ICE) support. As its name implies, LOADALL loads all of the CPU registers, including the "hidden" software-invisible registers. At the completion of a LOADALL instruction, the entire CPU state is defined according to the LOADALL data table. LOADALL loads all of the software-visible registers such as AX, and all of the software-invisible registers such as the segment descriptor caches.

By manipulating the descriptor cache base registers, you can access the entire address space without switching to protected mode. In other words, by using LOADALL, you can access memory above 1Mb from real mode. Since the alternative method for the 286 (switching to protected mode, accessing the desired memory, then resetting the CPU - the only way to get the 286 back to real mode) has a significant performance penalty, LOADALL is most significant to 286 programmers. LOADALL provides them with a new capability that is not available by any other means.

LOADALL Details

LOADALL is closely coupled with the CPU hardware. Both the 286 and 386 have different internal hardware and Intel implemented LOADALL using different opcodes on the 286 and 386. 80286 LOADALL (opcode 0F05) produces an invalid opcode exception when executed on the 386, and 80386 LOADALL (opcode 0F07) produces an invalid opcode exception when executed on the 286.

LOADALL loads all CPU registers (including MSW, GDTR, CSBASE, ESACCESS) from a memory image. You can execute LOADALL in real or protected mode, but only at privilege level 0 (CPL=0). If you execute LOADALL at any other privilege level, the CPU generates an exception.

By directly loading the descriptor cache registers with LOADALL, a program has explicit control over the base address, segment limit, and access rights associated with each memory segment. Normally, the CPU loads these values each time it loads a segment register, but LOADALL allows you to load these hidden registers independently of their segment register counterparts.

In real mode, LOADALL makes it possible to access a memory segment that is not associated with any segment register. Likewise in protected mode, you can access memory that has no descriptor table entry.

LOADALL performs no protection checks against any of the loaded register values. When you execute it at CPL 0, LOADALL can generate no exceptions. The segment access rights and limit portions may be values that would otherwise be illegal in the context of real mode or protected mode, but LOADALL willingly loads these values with no checks. Once loaded, however, the CPU performs full access checks when accessing a segment. For example, you can load a segment whose access is marked "not present." Normally, this condition would generate exception 11, "segment not present", but LOADALL does not generate exception 11. Instead, any attempt to access this segment will generate exception 13.

LOADALL does not check coherency between the software-visible segment registers and the software-invisible segment descriptor cache registers. Any segment descriptor base register may point to any area in the CPU address space, while the software-visible segment register may contain any other arbitrary value. The CPU makes all memory references according to the descriptor cache registers, not the software-visible segment registers. All subsequent segment register loads will reload the descriptor cache register. Beware of using values in CS that do not perfectly match a code segment descriptor table entry, or a real mode code segment - an interrupt return (IRET) may either cause an exception or execution to resume at an unexpected location. Likewise, pushing and subsequently popping any segment register will force the descriptor cache register to reload according to the CPU's conventional protocol, thereby inhibiting any further real mode extended memory references.

80286 LOADALL

You encode the 80286 LOADALL as a two-byte opcode, 0F05h. LOADALL reads its table from a fixed memory location at 800h (80:0 in real-mode addressing). LOADALL performs 51 bus cycles (WORD cycles), and takes 195 clocks with no wait states. Table 1 shows the format you must prepare at location 800h before executing the 286 LOADALL instruction. All CPU register entries in the LOADALL table conform to the standard Intel format, where the least significant byte is at the lowest memory address. Table 2 shows the 286 format of the descriptor cache entries.


Table 1 -- 80286 LOADALL Table
Physical Address Description Data Size Data Value

[800]
[802]
[804]
[806]
[808]
[80A]
[80C]
[80E]
[810]
[812]
[814]
[816]
[818]
[81A]
[81C]
[81E]
[820]
[822]
[824]
[826]
[828]
[82A]
[82C]
[82E]
[830]
[832]
[834]
[836]
[83C]
[842]
[848]
[84E]
[854]
[85A]
[860]
[866]

None
None
MSW
None
None
None
None
None
None
None
None
TR_REG
FLAGS
IP
LDT_REG
DS_REG
SS_REG
CS_REG
ES_REG
DI
SI
BP
SP
BX
DX
CX
AX
ES_DESC
CS_DESC
SS_DESC
DS_DESC
GDT_DESC
LDT_DESC
IDT_DESC
TSS_DESC
ENT OF TABLE
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
0
0
0
?
0
0
0
0
0
0
0
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
DESC_CACHE286 STRUC
    Addr_A15_A00 DW ?
    Addr_A23_A16 DB ?
    Access DB ?
    Limit DW ?
ENDS

Intel recommends some guidelines for proper execution following LOADALL. The stack segment should be a read/write data segment; the code segment can be execute on1y (access=95h), read/execute (access=9bh), or read/write/execute (access=93h). Proper protected mode operation also requires that the DPL of CS and DPL of SS be equal. These attributes determine the CPL of the processor. Also, the DPL fields of ES and DS should be equal to 3 to prevent RETF or IRET instructions from zeroing these registers.

The code in listing 1 demonstrates how to exp1ore the various operating modes with 286 LOADALL and how to access extended memory while in real mode. The LOADALL test performs various functions that would be impossible to duplicate without using LOADALL.

80386 LOADALL

The 386 LOADALL is encoded as a two-byte opcode (0F07). Unlike the 286 LOADALL, this LOADALL instruction reads its data from a table pointed to by ES:EDI. Segment overrides are allowed, but apparently ignored. The 386 LOADALL performs 51 bus cycles (DWORD cycles) and takes 122 clocks with no wait states. Table 3 shows the 386 LOADALL format. However, Table 3 does not show that prior to reading the LOADALL table, LOADALL reads 10 DWORDs exactly 100h bytes beyond the beginning of the table (ES:EDI+100h). This data is not used to load any of the registers LOADALL does not load (CR2, CR3, DRO-DR3, TR6, TR7), or the Numeric Processor eXtension (NPX). At this time, the purpose of reading this data and its destination is a mystery. Figure 1 shows an ICE trace showing all the bus cycles associated with LOADALL's execution.

As with the 286 LOADALL, all CPU register entries in the LOADALL table are in the standard Intel format where the least significant byte is at the lowest memory address. The 386 descriptor cache entries have the format shown in Table 4.

Listing 2 shows how to test 386 LOADALL. This test is more comprehensive than the 286 LOADALL test because of the expanded capabilities of the 386 microprocessor. This test puts the CPU into various states that are illegal and are impossible to duplicate through any other software means.

LOADALL Emulation

Due to the large number of systems programs that use 286 LOADALL, all 386 and 486 BIOS's must emu1ate the 286 LOADALL instruction (opcode 0F05). On the 386 and 486, the 286 LOADALL instruction generates an invalid opcode exception. The BIOS traps this exception and does its best to emulate the functionality of the LOADALL instruction, but perfect emulation is impossible without using LOADALL itself. Using 386 LOADALL to emulate 286 LOADALL can be done, but has its risks. First of all, the 486 does not have a LOADALL instruction. Second, Intel has threatened to remove LOADALL from the 386 mask.

Perfect emulation is possible on the 386 by using 386 LOADALL to emulate 286 LOADALL. Listing 3 shows a TSR program that uses 386 LOADALL to emulate 286 LOADALL. The program first tests that you are a 386 before insta1ling itself. By using this emu1ation program, you can guarantee perfect 286 LOADALL emulation.

Conclusion

LOADALL is a very powerful instruction, but the features that make it so powerful also make it risky. For example, LOADALL can put the processor in states that are otherwise impossible to duplicate through any other software means. Using LOADALL requires a thorough understanding of how the CPU processes register loads, the ramifications of those register loads, and careful planning. The illegally induced processor states can easily cause system crashes if not properly planned for. The best way to avoid system crashes is to avoid using LOADALL unless you are totally confident in your understanding of the CPU and in your programming skills.

The 286 LOADALL is described in a 15-page Intel-confidential document The document describes in detail how to use the instruction, and also describes many of its possible uses. LOADALL can be used to access extended memory while in real mode, and to emulate real mode while in protected mode. Programs such as RAMDRIVE, ABOVEDISC, and OS/2 use LOADALL. DOS 3.3 has provisions for using LOADALL by leaving a 102-byte 'hole' at 80:0. If you are a systems programmer and have a need to know this information, Intel will provide it, along with source code to emulate 286 LOADALL on the 386 (without using 386 LOADALL).

Unlike the 286 LOADALL, the 386 LOADALL is still an Intel top secret. l do not know of any document that describes its use, format, or acknowledges its existence. Very few people at Intel wil1 acknowledge that LOADALL even exists in the 80386 mask. The official Intel line is that, due to U.S. Military pressure, LOADALL was removed from the 80386 mask over a year ago. However, running the program in Listing-2 demonstrates that LOADALL is alive, well, and still available on the latest stepping of the 80386.


View source code for 286 LOADALL:
http://www.rcollins.org/ftp/source/286load/286load.asm
http://www.rcollins.org/ftp/source/286load/loadfns.286
http://www.rcollins.org/ftp/source/286load/macros.286
http://www.rcollins.org/ftp/source/include/cpu_type.asm

View source code for 386 LOADALL:
http://www.rcollins.org/ftp/source/386load/386load.asm
http://www.rcollins.org/ftp/source/386load/loadfns.386
http://www.rcollins.org/ftp/source/386load/macros.386
http://www.rcollins.org/ftp/source/include/cpu_type.asm

View source code for EMULOAD (286 LOADALL emulation using 386 LOADALL):
http://www.rcollins.org/ftp/source/emuload/emuload.asm
http://www.rcollins.org/ftp/source/include/cpu_type.asm

Download entire source code archive for 286LOAD, 386LOAD, and EMULOAD:
http://www.rcollins.org/ftp/dloads/loadall.zip

DESCRIPTOR CACHE REGISTERS

Whether in real or protected mode, the CPU stores the base address of each segment in hidden registers called descriptor cache registers. Each time the CPU loads a segment register, the segment base address, segment size limit, and access attributes (access rights) are loaded, or "cached," ) into these hidden registers. To enhance performance, the CPU makes all subsequent memory references via the descriptor cache registers instead of calculating the physical address, or looking up the base address in the descriptor table. Understanding the role of these hidden registers is paramount for exploiting highly advanced programming techniques, and for exploiting the undocumented LOADALL instruction.Figure 2(a) shows the descriptor cache layout for the 80286, and Figure 2(b) shows the layout for the 80386, and 80486.

Figure 2 (a) 80286 Descriptor Cache Register
[47..32] 31 [30..29] 28 [27..25] 24 [23..00]
16-bit Limit P DPL S Type A 24-bit base address


Figure 2 (b) 80386/80486 Descriptor Cache Register
[31..24] 23 [22..21] 20 [19..17] 16 15 14 [13..00]
0 P DPL S Type A 0 D 0
[63..32]
32-bit Physical Address
[95..64]
32-bit Limit

At power-up, the descriptor cache registers are loaded with fixed, default values, the CPU is in real mode, and all segments are marked as read/write data segments, including the code segment (CS). According to Intel, each time the CPU loads a segment register in real mode, the base address is 16 times the segment value, while the access rights and size limit attributes are given fixed, "real-mode compatible" values. This is not true. In fact, only the CS descriptor cache access rights get loaded with fixed values each time the segment register is 1oaded - and even then only when a far jump is encountered. Loading any other segment register in real mode does not change the access rights or the segment size limit attributes stored in the descriptor cache registers. For these segments, the access rights and segment size limit attributes are honored from any previous setting (see Figure 3). Thus it is possible to have a four giga-byte, read-only data segment in real mode on the 80386, but Intel will not acknowledge, or support this mode of operation.

Protected mode differs from real mode in this respect each time the CPU loads a segment register, it fully loads the descriptor cache register, no previous values are honored. The CPU loads the descriptor cache directly from the descriptor table. The CPU checks the validity of the segment by testing the access rights in the descriptor table, and illegal va1ues will generate exceptions. Any attempt to load CS with a read/write data segment will generate a protection error. Likewise, any attempt to 1oad a data segment register as an executable segment will also generate an exception. The CPU enforces these protection rules very strictly if the descriptor table entry passes all the tests, then the CPU loads the descriptor cache register.

Figure 3 -- Descriptor Cache Contents (Real Mode)


Table 2 (a) -- 80286 Descriptor Cache Entry Formats
Offset Description
0-2 24-bit physical address of the segment in memory. These bytes are stored in standard Intel format with the least significant byte at the lowest memory address.
3 Access rights. The format of this byte is the same as that in the descriptor table. This access byte is loaded in the descriptor cache register regardless of its validity. Therefore the "present" bit in the access rights field becomes a "descriptor valid" bit. When this bit is cleared, the descriptor is considered invalid, and any memory reference using this descriptor generates exception 13, with error code 0. The Descriptor Privilege Level (DPL) of the SS and CS descriptor caches determines the Current Privilege Level (CPL). The CS descriptor cache may be loaded as a read/write data segment.
4-5 Segment limit. The standard 16-bit segment limit stored in standard Intel format.



Table 2 (b) -- 80286 GDT and IDT Descriptor Cache Entry Formats
Offset Description
0-2 24-bit physical address of the segment in memory.
3 Should be 0.
4-5 Segment limit.


Table 3 -- 80386 LOADALL Table
Offset Description Data Size Data Value

[00]
[04]
[08]
[0C]
[10]
[14]
[18]
[1C]
[20]
[24]
[28]
[2C]
[30]
[34]
[38]
[3C]
[40]
[44]
[48]
[4C]
[50]
[54]
[60]
[6C]
[78]
[84]
[90]
[9C]
[A8]
[B4]
[C0]
[CC]

CR0
EFLAGS
EIP
EDI
ESI
EBP
ESP
EBX
EDX
ECX
EAX
DR6
DR7
TR_REG
LDT_REG
GS_REG
FS_REG
DS_REG
SS_REG
CS_REG
ES_REG
TSS_DESC
IDT_DESC
GDT_DESC
LDT_DESC
GS_DESC
FS_DESC
DS_DESC
SS_DESC
CS_DESC
ES_DESC
LENGTH OF TABLE
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
?
?
?
?
?
?
?
?
?
?
?
?
<?>
<?>
<?>
<?>
<?>
<?>
<?>
<?>
<?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
REG_STRUC STRUC
    REG_VAL    DW     ?
               DW     0
ENDS
DESC_CACHE STRUC
              DB     0
     _Type    DB     ?
              DB     0
              DB     0
     _Addr    DD     ?
    _Limit    DD     ?
ENDS


Table 4 (a) -- 80386 Descriptor Cache Entries
Offset Description
0-3 Access rights. The access rights dword consumes 11 bits of this 32-bit field. See figure 2 for a complete description of this field.
4-7 32-bit base address of the segment in memory..
8-11 32-bit base address of the segment in memory.



Table 4 (b) -- 80386 GDT and IDT Descriptor Cache Entry Formats
Offset Description
0-3 Should be 0.
4-7 32-bit base address of GDTR or IDTR.
8-11 32-bit limit of GDTR or IDTR.


Figure 1 -- In-Circuit-Emulator Trace of 80386 LOADALL Instruction
Frame The FRAME number is like a clock count for the CPU. At every CPU clock, the ICE takes a picture. When a valid cycle occurs, the ICE records its occurance. Therefore, it is possible to determine how many CPU clocks a sequence of instructions takes to execute by reading this information.
Type Cycle type. Shown here are F=Fetch, R=Read, and X=eXecute.
Address The 32-bit physical address asserted on the CPU address bus during each cycle.
Data The data asserted on the CPU data bus during each cycle.
BE3#
BE2#
BE1#
BE0#
Byte enable pins on the CPU. These pins determine which bytes of the 32-bits of data are valid. These pins are active low, so 8-bits of data are valid for each '0.'
W/R# Write/Read. Write = 1 Read = 0
D/C# Data/Code. Data = 1 Code = 0
M/IO# Memory/IO Memory = 1 IO = 0
Frame
Dec 
Type 
Address
(Hex) 
Data
(Hex) 
BBBB
EEEE
3210
#### 
WDM
///
RCI
  O
###
Comments
5
8
011
013
015
017
019
021
023
025
027
029
031
033
035
037
039
041
043
045
047
049
051
053
055
057
059
061
063
065
067
069
071
073
075
077
079
081
083
085
087
089
091
093
095
097
099
101
103
105
107
109
111
113
115
117
119
121
123
125
127
129
131
F
X
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
0000DE40
executed
0000D8F0
0000D8F4
0000D8F8
0000D8FC
0000D900
0000D904
0000D908
0000D90C
0000D910
0000D914
0000D7F0
0000D7F4
0000D7F8
0000D7FC
0000D800
0000D804
0000D808
0000D80C
0000D810
0000D814
0000D818
0000D81C
0000D820
0000D824
0000D828
0000D82C
0000D830
0000D834
0000D838
0000D83C
0000D840
0000D844
0000D848
0000D84C
0000D850
0000D854
0000D858
0000D85C
0000D860
0000D864
0000D868
0000D86C
0000D870
0000D874
0000D878
0000D87C
0000D880
0000D884
0000D888
0000D88C
0000D890
0000D894
0000D898
0000D89C
0000D8A0
0000D8A4
0000D8A8
0000D8AC
0000D8B0
0000D8B4
0000D8B8
B490070F
2bytes
01010101
02020202
03030303
04040404
05050505
06060606
07070707
08080808
09090909
0A0A0A0A
7FFFFFE0
00000002
00000133
66666666
77777777
55555555
88888888
22222222
44444444
33333333
11111111
FFFF0FF0
0000D402
xxxx0000
xxxx0000
xxxx5555
xxxx4444
xxxx2222
xxxx6666
xxxx1111
xxxx3333
00008900
00070000
00000800
00000000
00000000
000003FF
00000000
00000000
00000000
00008200
00090000
00000088
00008300
00050000
0000FFFF
00009300
00040000
0000FFFF
00009300
00020000
0000FFFF
00009300
00060000
0000FFFF
00009B00
0000DD30
0000FFFF
00009300
00030000
00FFFFFF
0000
at
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
1100
1100
1100
1100
1100
1100
1100
1100
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
001
DE40L
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
LOADALLfetched
LOADALLbeginsexecution
\
 \
  \   The10"mystery"
   \  reads,exactly
    \ 100hbytesbeyond
    / thebeginningof
   /  theLOADALLtable.
  /
 /
/
CR0
EFLAGS
EIP
EDI
ESI
EBP
ESP
EBX
EDX
ECX
EAX
DR6
DR7
TRRegister
LDTRegister
GSRegister
FSRegister
DSRegister
SSRegister
CSRegister
ESRegister
TSSDescriptorCache


IDTDescriptorCache


GDTDescriptorCache


LDTDescriptorCache


GSDescriptorCache


FSDescriptorCache


DSDescriptorCache


SSDescriptorCache


CSDescriptorCache


ESDescriptorCache


Back to Books and Articles home page