X86 Architecture - Segmentation

Segmentation

Further information: x86 memory segmentation

Minicomputers during the late 1970s were running up against the 16-bit 64-kB address limit, as memory had become cheaper. Some minicomputers like the PDP-11 used complex bank-switching schemes, or, in the case of Digital's VAX, redesigned much more expensive processors which could directly handle 32-bit addressing and data. The original 8086, developed from the simple 8080 microprocessor and primarily aiming at very small and inexpensive computers and other specialized devices, instead adopted simple segment registers which increased the memory address width by only 4 bits. By multiplying a 64-kB address by 16, the 20-bit address could address a total of one megabyte (1,048,576 bytes) which was quite a large amount for a small computer at the time. The concept of segment registers was not new to many mainframes which used segment registers to swap quickly to different tasks. In practice, on the x86 it was (is) a much-criticized implementation which greatly complicated many common programming tasks and compilers. However, the architecture soon allowed linear 32-bit addressing (starting with the 80386 in late 1985) but major actors (such as Microsoft) took several years to convert their 16-bit based systems. The 80386 (and 80486) was therefore largely used as a fast (but still 16-bit based) 8086 for many years.

Data and/or code could be managed within "near" 16-bit segments within this 1 MB address space, or a compiler could operate in a "far" mode using 32-bit segment:offset pairs reaching (only) 1 MB. While that would also prove to be quite limiting by the mid-1980s, it was working for the emerging PC market, and made it very simple to translate software from the older 8008, 8080, 8085, and Z80 to the newer processor. During 1985, the 16-bit segment addressing model was effectively factored out by the introduction of 32-bit offset registers, in the 386 design.

In real mode, segmentation is achieved by shifting the segment address left by 4 bits and adding an offset in order to receive a final 20-bit address. For example, if DS is A000h and SI is 5677h, DS:SI will point at the absolute address DS × 10h + SI = A5677h. Thus the total address space in real mode is 220 bytes, or 1 MB, quite an impressive figure for 1978. All memory addresses consist of both a segment and offset; every type of access (code, data, or stack) has a default segment register associated with it (for data the register is usually DS, for code it is CS, and for stack it is SS). For data accesses, the segment register can be explicitly specified (using a segment override prefix) to use any of the four segment registers.

In this scheme, two different segment/offset pairs can point at a single absolute location. Thus, if DS is A111h and SI is 4567h, DS:SI will point at the same A5677h as above. This scheme makes it impossible to use more than four segments at once. CS and SS are vital for the correct functioning of the program, so that only DS and ES can be used to point to data segments outside the program (or, more precisely, outside the currently executing segment of the program) or the stack.

In protected mode, a segment register no longer contains the physical address of the beginning of a segment, but contain a "selector" that points to a system-level structure called a segment descriptor. A segment descriptor contains the physical address of the beginning of the segment, the length of the segment, and access permissions to that segment. The offset is checked against the length of the segment, with offsets referring to locations outside the segment causing an exception. Offsets referring to locations inside the segment are combined with the physical address of the beginning of the segment to get the physical address corresponding to that offset.

The segmented nature can make programming and compiler design difficult because the use of near and far pointers affects performance.

Read more about this topic:  X86 Architecture