Most people never knew that the Pentium's original design included 36-bit addressing, and the capability to access 2M page sizes. These extensions were known as Page Address Extensions (PAE), and were to be enabled in CR4. When CR4.PAE=1 (CR4[5]=1), page address extensions were enabled. When CR4.PAE=0, A[35..32] were forced to 0, regardless of what addresses could be generated in protected mode with a descriptor pointing near 4G, and an offset pointing above the 4G address space. Even when CR4.PAE=1, addresses above 4G would not be generated unless they were the result of a page-mode, paging translation. The only means to access memory above 4G was through these extensions to page mode. This document will describe PAE based on what little I know from the Pentium, and from preliminary P6 literature. This document will also include extensions to PAE that are exclusive to P6.
Whether or not PAE was ever implemented in the Pentium beyond the conceptual stage is not known. But vestiges of its existence are visible throughout the Pentium documentation and architecture. There are at least four references to 2M pages in the various Pentium manuals[1,2,3,4]. In addition to these documentation references, CR4[5] is marked reserved, and was to enable PAE; CPUID.flags[6] is marked reserved and was to indicate the existence of PAE; the MSR TR8 is marked reserved, and contained the upper 4 address bits used for TLB testability. Now it appears that the P6 is going to implement 36-bit addressing and 2M page sizes.
To support 36-bit addressing, it is necessary to make substantial changes to the paging mechanism. 32-bit linear addresses are still used, but they are translated to 36-bit physical addresses. Intel choose to use a three-tier paging mechanism to support PAE for 4K pages, and a two-tier mechanism for 2M pages. When CR4.PAE=1, CR3 points to a small table of Page Directory Pointers (PDPs). Each PDP entry references a separate page directory. Each page directory points to a page table, for 4K pages, or directly to the page frame, for 2M pages. Figure 1 gives a detailed description of all of the CPU structures associated with page translations while PAE is enabled. For comparative purposes, Figure 2 gives a detailed description of all of the CPU structures associated with page translations while Page Size Extensions (PSE) is enabled (4-Mbyte pages).
Figure 1 -- Paging Structures for PAE
Figure 2 -- Paging Structures for PSE
In addition to CR4.PAE, which enables Page Address Extensions, CR4 contains another addition to enhance page mode performance. CR4.PGE (bit-7) enables Paging Global Extensions (PGE). PGE determines whether moves to CR3 flush all of the PTE's from the TLB, or only those whose G-bit (global bit) is not set. Likewise, for task switches which implicitly set CR3, CR4.PGE controls TLB flushing in the same manner.
As shown in Figure 1, CR3 is still a 32-bit register, and therefore the PDP must reside within the first 4G address space. Each PDP is selected by the upper 2 bits of the linear address -- A[31..30]. Therefore the PDP contains only 4 entries. Each PDP entry points to the physical address of a page directory, and is 64-bits wide, though only 36-bits are used. Therefore, each PDP can reference a page directory anywhere in the 64G address space. The index into the Page Directory (PDE) is determined by the linear address bits -- A[29..21]. The Page Directory is therefore limited to 512 entries (2^9) of 8-bytes each. Even though the PDE has been reduced to 512 entries, its structure takes up the same amount of memory space when CR4.PAE=0 (4096 bytes), because of the increase in its element size (to 8-bytes). For 4K pages, each 8-byte PDE points to the physical address of the Page Table. For 2M pages, each 8-byte PDE points to the physical address of the page frame, itself. For 4K pages, the index of the Page Table Entry (PTE) is determined by the linear address bits -- A[20..12]. Similar to the PDE, each Page Table is limited to 512 entries of 8-bytes each; each 8-byte entry pointing to the physical Page Frame Address (PFA). Figure 3 shows the page translation for 4K pages while CR4.PAE=1.
Figure 3 -- Page Translation for 4K Page Address Extensions
Page translation for 2M pages is virtually identical to 4M page translations. The main difference between the two translation mechanism, is the addition of the PDP reference, and the number of index bits in the PDE. Like 4K page translations with PAE enabled, each PDP entry points to the physical address of a page directory. The index into the Page Directory (PDE) is determined by linear address bits -- A[29..21]. The remaining address bits in the linear address, A[20..00], are used to directly index into the page frame. Since the offset is 21-bits wide, the page size is 2M (2^21). Figure 4 shows a diagram of page translations for 2M pages.
Figure 4 -- Page Translation for 2M Page Address Extensions
Some distinction needs to be made as to whether PAE and PSE are mutually exclusive, and which has a higher precedence. Likewise, what is the role of the PDE.PS bit when the page address extensions are enabled. I will assume the two features are mutually exclusive, and that PAE has higher precedence than PSE. Therefore, Table 1 details a description for possible combinations of PAE, PSE, and PDE.PS.
Table 1 -- Control bits for Paging Extensions
Definition of fields in paging structure figures: