Intel originally included LOADALL
in the CPU mask for testing purposes and In Circuit
Emulator (ICE) support. As its name implies, LOADALL loads
all of the CPU registers, including the
"hidden" software-invisible registers. At the
completion of a LOADALL instruction, the entire
CPU state is defined according to the LOADALL data
table. LOADALL loads all of the software-visible
registers such as AX, and all of the
software-invisible registers such as the segment descriptor caches. By
manipulating the descriptor cache base registers, you can
access the entire address space without switching to
protected mode. In other words, by using LOADALL, you
can access memory above 1Mb from real mode. Since the
alternative method for the 286 (switching to protected
mode, accessing the desired memory, then resetting the
CPU - the only way to get the 286 back to real mode) has
a significant performance penalty, LOADALL is most
significant to 286 programmers. LOADALL provides
them with a new capability that is not available by any
other means.
LOADALL Details
LOADALL is closely coupled with the CPU
hardware. Both the 286 and 386 have different internal
hardware and Intel implemented LOADALL using
different opcodes on the 286 and 386. 80286 LOADALL (opcode
0F05) produces an invalid opcode exception when executed
on the 386, and 80386 LOADALL (opcode 0F07)
produces an invalid opcode exception when executed on the
286.
LOADALL loads all CPU registers (including MSW,
GDTR, CSBASE, ESACCESS) from a memory image. You can
execute LOADALL in real or protected mode, but
only at privilege level 0 (CPL=0). If you execute LOADALL
at any other privilege level, the CPU generates an
exception.
By directly loading the descriptor cache registers
with LOADALL, a program has explicit control over
the base address, segment limit, and access rights
associated with each memory segment. Normally, the CPU
loads these values each time it loads a segment register,
but LOADALL allows you to load these hidden
registers independently of their segment register
counterparts.
In real mode, LOADALL makes it possible to
access a memory segment that is not associated with any
segment register. Likewise in protected mode, you can
access memory that has no descriptor table entry.
LOADALL performs no protection checks against
any of the loaded register values. When you execute it at
CPL 0, LOADALL can generate no exceptions. The
segment access rights and limit portions may be values
that would otherwise be illegal in the context of real
mode or protected mode, but LOADALL willingly
loads these values with no checks. Once loaded, however,
the CPU performs full access checks when accessing a
segment. For example, you can load a segment whose access
is marked "not present." Normally, this
condition would generate exception 11, "segment not
present", but LOADALL does not generate
exception 11. Instead, any attempt to access this segment
will generate exception 13.
LOADALL does not check coherency between the
software-visible segment registers and the
software-invisible segment descriptor cache registers.
Any segment descriptor base register may point to any
area in the CPU address space, while the software-visible
segment register may contain any other arbitrary value.
The CPU makes all memory references according to the
descriptor cache registers, not the software-visible
segment registers. All subsequent segment register loads
will reload the descriptor cache register. Beware of
using values in CS that do not perfectly match a code
segment descriptor table entry, or a real mode code
segment - an interrupt return (IRET) may either
cause an exception or execution to resume at an
unexpected location. Likewise, pushing and subsequently
popping any segment register will force the descriptor
cache register to reload according to the CPU's
conventional protocol, thereby inhibiting any further
real mode extended memory references.
80286 LOADALL
You encode the 80286 LOADALL as a two-byte
opcode, 0F05h. LOADALL reads its table from a
fixed memory location at 800h (80:0 in real-mode
addressing). LOADALL performs 51 bus cycles (WORD
cycles), and takes 195 clocks with no wait states. Table 1 shows the format you must
prepare at location 800h before executing the 286 LOADALL
instruction. All CPU register entries in the LOADALL
table conform to the standard Intel format, where the
least significant byte is at the lowest memory address. Table 2 shows the 286 format of the
descriptor cache entries.
Table 1 -- 80286 LOADALL
Table
Physical Address |
Description |
Data Size |
Data Value |
[800]
[802]
[804]
[806]
[808]
[80A]
[80C]
[80E]
[810]
[812]
[814]
[816]
[818]
[81A]
[81C]
[81E]
[820]
[822]
[824]
[826]
[828]
[82A]
[82C]
[82E]
[830]
[832]
[834]
[836]
[83C]
[842]
[848]
[84E]
[854]
[85A]
[860]
[866]
|
None
None
MSW
None
None
None
None
None
None
None
None
TR_REG
FLAGS
IP
LDT_REG
DS_REG
SS_REG
CS_REG
ES_REG
DI
SI
BP
SP
BX
DX
CX
AX
ES_DESC
CS_DESC
SS_DESC
DS_DESC
GDT_DESC
LDT_DESC
IDT_DESC
TSS_DESC
ENT OF TABLE |
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
|
0
0
0
?
0
0
0
0
0
0
0
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
|
DESC_CACHE286 STRUC
Addr_A15_A00 DW ?
Addr_A23_A16 DB ?
Access DB ?
Limit DW ?
ENDS
|
Intel recommends some guidelines for
proper execution following LOADALL. The stack
segment should be a read/write data segment; the code
segment can be execute on1y (access=95h), read/execute
(access=9bh), or read/write/execute (access=93h). Proper
protected mode operation also requires that the DPL of CS
and DPL of SS be equal. These attributes
determine the CPL of the processor. Also, the DPL fields
of ES and DS should be equal to 3 to
prevent RETF or IRET instructions from
zeroing these registers.
The code in listing 1
demonstrates how to exp1ore the various operating modes
with 286 LOADALL and how to access extended memory
while in real mode. The LOADALL test performs
various functions that would be impossible to duplicate
without using LOADALL.
80386 LOADALL
The 386 LOADALL is encoded as a two-byte opcode
(0F07). Unlike the 286 LOADALL, this LOADALL instruction
reads its data from a table pointed to by ES:EDI. Segment
overrides are allowed, but apparently ignored. The 386
LOADALL performs 51 bus cycles (DWORD cycles) and takes
122 clocks with no wait states. Table 3 shows the 386
LOADALL format. However, Table 3 does
not show that prior to reading the LOADALL table, LOADALL
reads 10 DWORDs exactly 100h bytes beyond the beginning
of the table (ES:EDI+100h). This data is not used to load
any of the registers LOADALL does not load (CR2, CR3,
DRO-DR3, TR6, TR7), or the Numeric Processor eXtension
(NPX). At this time, the purpose of reading this data and
its destination is a mystery. Figure 1 shows an ICE trace
showing all the bus cycles associated with LOADALL's
execution.
As with the 286 LOADALL, all CPU register entries in
the LOADALL table are in the standard Intel format where
the least significant byte is at the lowest memory
address. The 386 descriptor cache entries have the format
shown in Table 4.
Listing
2 shows how to test 386 LOADALL. This test is more
comprehensive than the 286 LOADALL test because of the
expanded capabilities of the 386 microprocessor. This
test puts the CPU into various states that are illegal
and are impossible to duplicate through any other
software means.
LOADALL Emulation
Due to the large number of systems programs that use
286 LOADALL, all 386 and 486 BIOS's must emu1ate
the 286 LOADALL instruction (opcode 0F05). On
the 386 and 486, the 286 LOADALL instruction
generates an invalid opcode exception. The BIOS traps
this exception and does its best to emulate the
functionality of the LOADALL instruction, but
perfect emulation is impossible without using LOADALL itself.
Using 386 LOADALL to emulate 286 LOADALL can
be done, but has its risks. First of all, the 486 does
not have a LOADALL instruction. Second, Intel has
threatened to remove LOADALL from the 386 mask.
Perfect emulation is possible on the 386 by using 386 LOADALL
to emulate 286 LOADALL. Listing 3
shows a TSR program that uses 386 LOADALL to
emulate 286 LOADALL. The program first tests that
you are a 386 before insta1ling itself. By using this
emu1ation program, you can guarantee perfect 286 LOADALL
emulation.
Conclusion
LOADALL is a very powerful instruction, but the
features that make it so powerful also make it risky. For
example, LOADALL can put the processor in states
that are otherwise impossible to duplicate through any
other software means. Using LOADALL requires a
thorough understanding of how the CPU processes register
loads, the ramifications of those register loads, and
careful planning. The illegally induced processor states
can easily cause system crashes if not properly planned
for. The best way to avoid system crashes is to avoid
using LOADALL unless you are totally confident in
your understanding of the CPU and in your programming
skills.
The 286 LOADALL is described in a 15-page
Intel-confidential document The document describes in
detail how to use the instruction, and also describes
many of its possible uses. LOADALL can be used to
access extended memory while in real mode, and to emulate
real mode while in protected mode. Programs such as
RAMDRIVE, ABOVEDISC, and OS/2 use LOADALL. DOS 3.3
has provisions for using LOADALL by leaving a
102-byte 'hole' at 80:0. If you are a systems programmer
and have a need to know this information, Intel will
provide it, along with source code to emulate 286 LOADALL
on the 386 (without using 386 LOADALL).
Unlike the 286 LOADALL, the 386 LOADALL is
still an Intel top secret. l do not know of any document
that describes its use, format, or acknowledges its
existence. Very few people at Intel wil1 acknowledge that
LOADALL even exists in the 80386 mask. The
official Intel line is that, due to U.S. Military
pressure, LOADALL was removed from the 80386 mask
over a year ago. However, running the program in
Listing-2 demonstrates that LOADALL is alive,
well, and still available on the latest stepping of the
80386.
View source code for 286 LOADALL:
http://www.rcollins.org/ftp/source/286load/286load.asm
http://www.rcollins.org/ftp/source/286load/loadfns.286
http://www.rcollins.org/ftp/source/286load/macros.286
http://www.rcollins.org/ftp/source/include/cpu_type.asm
View source code for 386 LOADALL:
http://www.rcollins.org/ftp/source/386load/386load.asm
http://www.rcollins.org/ftp/source/386load/loadfns.386
http://www.rcollins.org/ftp/source/386load/macros.386
http://www.rcollins.org/ftp/source/include/cpu_type.asm
View source code for EMULOAD (286 LOADALL
emulation using 386 LOADALL):
http://www.rcollins.org/ftp/source/emuload/emuload.asm
http://www.rcollins.org/ftp/source/include/cpu_type.asm
Download entire source code archive for 286LOAD,
386LOAD, and EMULOAD:
http://www.rcollins.org/ftp/dloads/loadall.zip
|
DESCRIPTOR
CACHE REGISTERS
Whether in real or protected mode, the CPU
stores the base address of each segment in hidden
registers called descriptor cache registers. Each
time the CPU loads a segment register, the
segment base address, segment size limit, and
access attributes (access rights) are loaded, or
"cached," ) into these hidden
registers. To enhance performance, the CPU makes
all subsequent memory references via the
descriptor cache registers instead of calculating
the physical address, or looking up the base
address in the descriptor table. Understanding
the role of these hidden registers is paramount
for exploiting highly advanced programming
techniques, and for exploiting the undocumented
LOADALL instruction.Figure 2(a)
shows the descriptor cache layout for the 80286,
and Figure 2(b) shows the
layout for the 80386, and 80486.
Figure
2 (a) 80286 Descriptor Cache Register
[47..32] |
31 |
[30..29] |
28 |
[27..25] |
24 |
[23..00] |
16-bit
Limit |
P |
DPL |
S |
Type |
A |
24-bit
base address |
Figure
2 (b) 80386/80486 Descriptor Cache
Register
[31..24] |
23 |
[22..21] |
20 |
[19..17] |
16 |
15 |
14 |
[13..00] |
0 |
P |
DPL |
S |
Type |
A |
0 |
D |
0 |
|
[63..32] |
32-bit
Physical Address |
|
|
At power-up, the descriptor
cache registers are loaded with fixed, default
values, the CPU is in real mode, and all segments
are marked as read/write data segments, including
the code segment (CS). According to Intel, each
time the CPU loads a segment register in real
mode, the base address is 16 times the segment
value, while the access rights and size limit
attributes are given fixed, "real-mode
compatible" values. This is not true. In
fact, only the CS descriptor cache access rights
get loaded with fixed values each time the
segment register is 1oaded - and even then only
when a far jump is encountered. Loading any other
segment register in real mode does not change the
access rights or the segment size limit
attributes stored in the descriptor cache
registers. For these segments, the access rights
and segment size limit attributes are honored
from any previous setting (see Figure
3). Thus it is possible to have a four
giga-byte, read-only data segment in real mode on
the 80386, but Intel will not acknowledge, or
support this mode of operation.
Protected mode differs from real mode in this
respect each time the CPU loads a segment
register, it fully loads the descriptor cache
register, no previous values are honored. The CPU
loads the descriptor cache directly from the
descriptor table. The CPU checks the validity of
the segment by testing the access rights in the
descriptor table, and illegal va1ues will
generate exceptions. Any attempt to load CS with
a read/write data segment will generate a
protection error. Likewise, any attempt to 1oad a
data segment register as an executable segment
will also generate an exception. The CPU enforces
these protection rules very strictly if the
descriptor table entry passes all the tests, then
the CPU loads the descriptor cache register.
Figure
3 -- Descriptor Cache Contents (Real Mode)
|
Table 2 (a) -- 80286 Descriptor Cache Entry Formats
Offset
|
Description |
0-2 |
24-bit physical address of the segment in
memory. These bytes are stored in standard Intel
format with the least significant byte at the
lowest memory address. |
3 |
Access rights. The format of this byte is the
same as that in the descriptor table. This access
byte is loaded in the descriptor cache register
regardless of its validity. Therefore the
"present" bit in the access rights
field becomes a "descriptor valid" bit.
When this bit is cleared, the descriptor is
considered invalid, and any memory reference
using this descriptor generates exception 13,
with error code 0. The Descriptor Privilege Level
(DPL) of the SS and CS descriptor caches
determines the Current Privilege Level (CPL). The
CS descriptor cache may be loaded as a read/write
data segment. |
4-5 |
Segment limit. The standard 16-bit segment
limit stored in standard Intel format. |
Table 2 (b) -- 80286 GDT and IDT Descriptor Cache
Entry Formats
Offset |
Description |
0-2 |
24-bit physical address of the segment in
memory. |
3 |
Should be 0. |
4-5 |
Segment limit. |
Table 3 -- 80386 LOADALL
Table
Offset |
Description |
Data Size |
Data Value |
[00]
[04]
[08]
[0C]
[10]
[14]
[18]
[1C]
[20]
[24]
[28]
[2C]
[30]
[34]
[38]
[3C]
[40]
[44]
[48]
[4C]
[50]
[54]
[60]
[6C]
[78]
[84]
[90]
[9C]
[A8]
[B4]
[C0]
[CC]
|
CR0
EFLAGS
EIP
EDI
ESI
EBP
ESP
EBX
EDX
ECX
EAX
DR6
DR7
TR_REG
LDT_REG
GS_REG
FS_REG
DS_REG
SS_REG
CS_REG
ES_REG
TSS_DESC
IDT_DESC
GDT_DESC
LDT_DESC
GS_DESC
FS_DESC
DS_DESC
SS_DESC
CS_DESC
ES_DESC
LENGTH OF TABLE |
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
|
?
?
?
?
?
?
?
?
?
?
?
?
<?>
<?>
<?>
<?>
<?>
<?>
<?>
<?>
<?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
|
REG_STRUC STRUC
REG_VAL DW ?
DW 0
ENDS
|
DESC_CACHE STRUC
DB 0
_Type DB ?
DB 0
DB 0
_Addr DD ?
_Limit DD ?
ENDS
|
Table 4 (a) -- 80386 Descriptor Cache Entries
Offset
|
Description |
0-3 |
Access rights. The access rights dword
consumes 11 bits of this 32-bit field. See figure 2 for a
complete description of this field. |
4-7 |
32-bit base address of the segment in
memory.. |
8-11 |
32-bit base address of the segment in memory. |
Table 4 (b) -- 80386 GDT
and IDT Descriptor Cache Entry Formats
Offset
|
Description |
0-3 |
Should be 0. |
4-7 |
32-bit base address of GDTR or IDTR. |
8-11 |
32-bit limit of GDTR or IDTR. |
Figure 1 -- In-Circuit-Emulator
Trace of 80386 LOADALL
Instruction
Frame |
The FRAME
number is like a clock count for the CPU. At every CPU
clock, the ICE takes a picture. When a valid cycle
occurs, the ICE records its occurance. Therefore, it is
possible to determine how many CPU clocks a sequence of
instructions takes to execute by reading this
information. |
Type |
Cycle type.
Shown here are F=Fetch, R=Read, and X=eXecute. |
Address |
The 32-bit
physical address asserted on the CPU address bus during
each cycle. |
Data |
The data asserted on the
CPU data bus during each cycle. |
BE3#
BE2#
BE1#
BE0# |
Byte enable pins on the
CPU. These pins determine which bytes of the 32-bits of
data are valid. These pins are active low, so 8-bits of
data are valid for each '0.' |
W/R# |
Write/Read. |
Write = 1 |
Read = 0 |
D/C# |
Data/Code. |
Data = 1 |
Code = 0 |
M/IO# |
Memory/IO |
Memory = 1 |
IO = 0 |
Frame
Dec
|
Type
|
Address
(Hex)
|
Data
(Hex)
|
BBBB
EEEE
3210
####
|
WDM
///
RCI
O
###
|
Comments
|
5
8
011
013
015
017
019
021
023
025
027
029
031
033
035
037
039
041
043
045
047
049
051
053
055
057
059
061
063
065
067
069
071
073
075
077
079
081
083
085
087
089
091
093
095
097
099
101
103
105
107
109
111
113
115
117
119
121
123
125
127
129
131
|
F
X
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
|
0000DE40
executed
0000D8F0
0000D8F4
0000D8F8
0000D8FC
0000D900
0000D904
0000D908
0000D90C
0000D910
0000D914
0000D7F0
0000D7F4
0000D7F8
0000D7FC
0000D800
0000D804
0000D808
0000D80C
0000D810
0000D814
0000D818
0000D81C
0000D820
0000D824
0000D828
0000D82C
0000D830
0000D834
0000D838
0000D83C
0000D840
0000D844
0000D848
0000D84C
0000D850
0000D854
0000D858
0000D85C
0000D860
0000D864
0000D868
0000D86C
0000D870
0000D874
0000D878
0000D87C
0000D880
0000D884
0000D888
0000D88C
0000D890
0000D894
0000D898
0000D89C
0000D8A0
0000D8A4
0000D8A8
0000D8AC
0000D8B0
0000D8B4
0000D8B8
|
B490070F
2bytes
01010101
02020202
03030303
04040404
05050505
06060606
07070707
08080808
09090909
0A0A0A0A
7FFFFFE0
00000002
00000133
66666666
77777777
55555555
88888888
22222222
44444444
33333333
11111111
FFFF0FF0
0000D402
xxxx0000
xxxx0000
xxxx5555
xxxx4444
xxxx2222
xxxx6666
xxxx1111
xxxx3333
00008900
00070000
00000800
00000000
00000000
000003FF
00000000
00000000
00000000
00008200
00090000
00000088
00008300
00050000
0000FFFF
00009300
00040000
0000FFFF
00009300
00020000
0000FFFF
00009300
00060000
0000FFFF
00009B00
0000DD30
0000FFFF
00009300
00030000
00FFFFFF
|
0000
at
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
1100
1100
1100
1100
1100
1100
1100
1100
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
|
001
DE40L
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
|
LOADALLfetched
LOADALLbeginsexecution
\
\
\ The10"mystery"
\ reads,exactly
\ 100hbytesbeyond
/ thebeginningof
/ theLOADALLtable.
/
/
/
CR0
EFLAGS
EIP
EDI
ESI
EBP
ESP
EBX
EDX
ECX
EAX
DR6
DR7
TRRegister
LDTRegister
GSRegister
FSRegister
DSRegister
SSRegister
CSRegister
ESRegister
TSSDescriptorCache
IDTDescriptorCache
GDTDescriptorCache
LDTDescriptorCache
GSDescriptorCache
FSDescriptorCache
DSDescriptorCache
SSDescriptorCache
CSDescriptorCache
ESDescriptorCache
|
|