Intel 8086 - Details - Segmentation

Segmentation

See also: x86 memory segmentation

There are also four 16-bit segment registers (see figure) that allow the 8086 CPU to access one megabyte of memory in an unusual way. Rather than concatenating the segment register with the address register, as in most processors whose address space exceeded their register size, the 8086 shifts the 16-bit segment only four bits left before adding it to the 16-bit offset (16×segment + offset), therefore producing a 20-bit external (or effective or physical) address from the 32-bit segment:offset pair. As a result, each external address can be referred to by 212 = 4096 different segment:offset pairs. The 16-byte separation between segment bases (due to the 4-bit shift) is called a paragraph. Although considered complicated and cumbersome by many programmers, this scheme also has advantages; a small program (less than 64 KB) can be loaded starting at a fixed offset (such as 0) in its own segment, avoiding the need for relocation, with at most 15 bytes of alignment waste.

Compilers for the 8086-family commonly support two types of pointer, near and far. Near pointers are 16-bit offsets implicitly associated with the program's code and/or data segment and so can be used only within parts of a program small enough to fit in one segment. Far pointers are 32-bit segment:offset pairs resolving to 20-bit external addresses. Some compilers also support huge pointers, which are like far pointers except that pointer arithmetic on a huge pointer treats it as a linear 20-bit pointer, while pointer arithmetic on a far pointer wraps around within its 16-bit offset without touching the segment part of the address.

To avoid the need to specify near and far on numerous pointers, data structures, and functions, compilers also support "memory models" which specify default pointer sizes. The tiny (max 64K), small (max 128K), compact (data > 64K), medium (code > 64K), large (code,data > 64K), and huge (individual arrays > 64K) models cover practical combinations of near, far, and huge pointers for code and data. The tiny model means that code and data are shared in a single segment, just as in most 8-bit based processors, and can be used to build .com-files for instance. Precompiled libraries often came in several versions compiled for different memory models.

According to Morse et al., the designers actually contemplated using an 8-bit shift (instead of 4-bit), in order to create a 16 MB physical address space. However, as this would have forced segments to begin on 256-byte boundaries, and 1 MB was considered very large for a microprocessor around 1976, the idea was dismissed. Also, there were not enough pins available on a low-cost 40-pin package for the additional four address bus pins.

In principle, the address space of the x86 series could have been extended in later processors by increasing the shift value, as long as applications obtained their segments from the operating system and did not make assumptions about the equivalence of different segment:offset pairs. In practice the use of "huge" pointers and similar mechanisms was widespread and the flat 32-bit addressing made possible with the 32-bit offset registers in the 80386 eventually extended the limited addressing range in a more general way (see below).

Read more about this topic:  Intel 8086, Details