CPU Cache - Address Translation

Address Translation

Most general purpose CPUs implement some form of virtual memory. To summarize, each program running on the machine sees its own simplified address space, which contains code and data for that program only. Each program uses this virtual address space without regard for where it exists in physical memory.

Virtual memory requires the processor to translate virtual addresses generated by the program into physical addresses in main memory. The portion of the processor that does this translation is known as the memory management unit (MMU). The fast path through the MMU can perform those translations stored in the translation lookaside buffer (TLB), which is a cache of mappings from the operating system's page table.

For the purposes of the present discussion, there are three important features of address translation:

Latency: The physical address is available from the MMU some time, perhaps a few cycles, after the virtual address is available from the address generator.

Aliasing: Multiple virtual addresses can map to a single physical address. Most processors guarantee that all updates to that single physical address will happen in program order. To deliver on that guarantee, the processor must ensure that only one copy of a physical address resides in the cache at any given time.

Granularity: The virtual address space is broken up into pages. For instance, a 4 GB virtual address space might be cut up into 1048576 pages of 4 kB size, each of which can be independently mapped. There may be multiple page sizes supported; see virtual memory for elaboration.

A historical note: some early virtual memory systems were very slow, because they required an access to the page table (held in main memory) before every programmed access to main memory. With no caches, this effectively cut the speed of the machine in half. The first hardware cache used in a computer system was not actually a data or instruction cache, but rather a TLB.

Caches can be divided into 4 types, based on whether the index or tag correspond to physical or virtual addresses:

Physically indexed, physically tagged (PIPT) caches use the physical address for both the index and the tag. While this is simple and avoids problems with aliasing, it is also slow, as the physical address must be looked up (which could involve a TLB miss and access to main memory) before that address can be looked up in the cache.

Virtually indexed, virtually tagged (VIVT) caches use the virtual address for both the index and the tag. This caching scheme can result in much faster lookups, since the MMU doesn't need to be consulted first to determine the physical address for a given virtual address. However, VIVT suffers from aliasing problems, where several different virtual addresses may refer to the same physical address. The result is that such addresses would be cached separately despite referring to the same memory, causing coherency problems. Another problem is homonyms, where the same virtual address maps to several different physical addresses. It is not possible to distinguish these mappings by only looking at the virtual index, though potential solutions include: flushing the cache after a context switch, forcing address spaces to be non-overlapping, tagging the virtual address with an address space ID (ASID), or using physical tags. Additionally, there is a problem that virtual-to-physical mappings can change, which would require flushing cache lines, as the VAs would no longer be valid.

Virtually indexed, physically tagged (VIPT) caches use the virtual address for the index and the physical address in the tag. The advantage over PIPT is lower latency, as the cache line can be looked up in parallel with the TLB translation, however the tag can't be compared until the physical address is available. The advantage over VIVT is that since the tag has the physical address, the cache can detect homonyms. VIPT requires more tag bits, as the index bits no longer represent the same address.

Physically indexed, virtually tagged (PIVT) caches are only theoretical as they would basically be useless.

The speed of this recurrence (the load latency) is crucial to CPU performance, and so most modern level-1 caches are virtually indexed, which at least allows the MMU's TLB lookup to proceed in parallel with fetching the data from the cache RAM.

But virtual indexing is not the best choice for all cache levels. The cost of dealing with virtual aliases grows with cache size, and as a result most level-2 and larger caches are physically indexed.

Caches have historically used both virtual and physical addresses for the cache tags, although virtual tagging is now uncommon. If the TLB lookup can finish before the cache RAM lookup, then the physical address is available in time for tag compare, and there is no need for virtual tagging. Large caches, then, tend to be physically tagged, and only small, very low latency caches are virtually tagged. In recent general-purpose CPUs, virtual tagging has been superseded by vhints, as described below.

Famous quotes containing the words address and/or translation:

“Surely the writer is to address a world of laborers, and such therefore must be his own discipline.”
—Henry David Thoreau (1817–1862)

“The Bible is for the Government of the People, by the People, and for the People.”
—General prologue, Wycliffe translation of the Bible (1384)

Related Phrases

Cache (computing)

Related Words

Cache