Inline Caching - Inline Caching

Inline Caching

The concept of inline caching is based on the empirical observation that the objects that occur at a particular call site are often of the same type. In those cases, performance can be increased greatly by storing the result of a method lookup "inline", i.e. directly at the call site. To facilitate this process, call sites are assigned different states. Initially, a call site is considered to be "uninitialized". Once the language runtime reaches a particular uninitialized call site, it performs the dynamic lookup, stores the result at the call site and changes its state to "monomorphic". If the language runtime reaches the same call site again, it retrieves the callee from it and invokes it directly without performing any more lookups. To account for the possibility that objects of different types may occur at the same call site, the language runtime also has to insert guard conditions into the code. Most commonly, these are inserted into the preamble of the callee rather than at the call site to better exploit branch prediction and to save space due to one copy in the preamble versus multiple copies at each call site. If a call site that is in the "monomorphic" state encounters a type other than the one it expects, it has to change back to the "uninitialized" state and perform a full dynamic lookup again.

The canonical implementation is a register load of a constant followed by a call instruction. The "uninitialized" state is better called "unlinked". The register is loaded with the message selector (typically the address of some object) and the call is to the run-time routine that will look-up the message in the class of the current receiver, using the first-level method lookup cache above. The run-time routine then rewrites the instructions, changing the load instruction to load the register with the type of the current receiver, and the call instruction to call the preamble of the target method, now "linking" the call site to the target method. Execution then continues immediately following the preamble. A subsequent execution will call the preamble directly. The preamble then derives the type of the current receiver and compares it with that in the register; if they agree the receiver is of the same type and the method continues to execute. If not, the preamble again calls the run-time and various strategies are possible, one being to relink the call-site for the new receiver type.

The performance gains come from having to do one type comparison, instead of at least a type comparison and a selector comparison for the first-level method lookup cache, and from using a direct call (which will benefit from instruction prefetch and pipe-lining) as opposed to the indirect call in a method-lookup or a vtable dispatch.

Read more about this topic: Inline Caching