Vectorization (parallel Computing) - Run-time Vs. Compile-time

Run-time Vs. Compile-time

Some vectorizations cannot be fully checked at compile time. Compile-time optimization requires an explicit array index. Library functions can also defeat optimization if the data they process is supplied by the caller. Even in these cases, run-time optimization can still vectorize loops on-the-fly.

This run-time check is made in the prelude stage and directs the flow to vectorized instructions if possible, otherwise reverting to standard processing, depending on the variables that are being passed on the registers or scalar variables.

The following code can easily be vectorized on compile time, as it doesn't have any dependence on external parameters. Also, the language guarantees that neither will occupy the same region in memory as any other variable, as they are local variables and live only in the execution stack.

int a; int b; // initialize b for (i = 0; i<128; i++) a = b + 5;

On the other hand, the code below has no information on memory positions, because the references are pointers and the memory they point to lives in the heap.

int *a = malloc(128*sizeof(int)); int *b = malloc(128*sizeof(int)); // initialize b for (i = 0; i<128; i++, a++, b++) *a = *b + 5; // ... // ... // ... free(b); free(a);

A quick run-time check on the address of both a and b, plus the loop iteration space (128) is enough to tell if the arrays overlap or not, thus revealing any dependencies.

Read more about this topic: Vectorization (parallel Computing)