Bitap Algorithm - Exact Searching

Exact Searching

The bitap algorithm for exact string searching, in full generality, looks like this when implemented in C:

#include #include typedef char BIT; /* needs only to hold the values 0 and 1 */ const char *bitap_search(const char *text, const char *pattern) { const char *result = NULL; int m = strlen(pattern); BIT *R; int i, k; if (pattern == '\0') return text; /* Initialize the bit array R */ R = malloc((m+1) * sizeof *R); R = 1; for (k=1; k <= m; ++k) R = 0; for (i=0; text != '\0'; ++i) { /* Update the bit array. */ for (k=m; k >= 1; --k) R = R && (text == pattern); if (R) { result = (text+i - m) + 1; break; } } free(R); return result; }

Bitap distinguishes itself from other well-known string searching algorithms in its natural mapping onto simple bitwise operations, as in the following modification of the above program. Notice that in this implementation, counterintuitively, each bit with value zero indicates a match, and each bit with value 1 indicates a non-match. The same algorithm can be written with the intuitive semantics for 0 and 1, but in that case we must introduce another instruction into the inner loop to set R |= 1. In this implementation, we take advantage of the fact that left-shifting a value shifts in zeros on the right, which is precisely the behavior we need.

Notice also that we require CHAR_MAX additional bitmasks in order to convert the (text == pattern) condition in the general implementation into bitwise operations. Therefore, the bitap algorithm performs better when applied to inputs over smaller alphabets.

#include #include const char *bitap_bitwise_search(const char *text, const char *pattern) { int m = strlen(pattern); unsigned long R; unsigned long pattern_mask; int i; if (pattern == '\0') return text; if (m > 31) return "The pattern is too long!"; /* Initialize the bit array R */ R = ~1; /* Initialize the pattern bitmasks */ for (i=0; i <= CHAR_MAX; ++i) pattern_mask = ~0; for (i=0; i < m; ++i) pattern_mask] &= ~(1UL << i); for (i=0; text != '\0'; ++i) { /* Update the bit array */ R |= pattern_mask]; R <<= 1; if (0 == (R & (1UL << m))) return (text + i - m) + 1; } return NULL; }

Read more about this topic:  Bitap Algorithm

Famous quotes containing the words exact and/or searching:

    If we define a sign as an exact reference, it must include symbol because a symbol is an exact reference too. The difference seems to be that a sign is an exact reference to something definite and a symbol an exact reference to something indefinite.
    William York Tindall (1903–1981)

    Through searching out origins, one becomes a crab. The historian looks backwards, and finally he also believes backwards.
    Friedrich Nietzsche (1844–1900)