Bitap Algorithm - Exact Searching

Exact Searching

The bitap algorithm for exact string searching, in full generality, looks like this when implemented in C:

#include #include typedef char BIT; /* needs only to hold the values 0 and 1 */ const char *bitap_search(const char *text, const char *pattern) { const char *result = NULL; int m = strlen(pattern); BIT *R; int i, k; if (pattern == '\0') return text; /* Initialize the bit array R */ R = malloc((m+1) * sizeof *R); R = 1; for (k=1; k <= m; ++k) R = 0; for (i=0; text != '\0'; ++i) { /* Update the bit array. */ for (k=m; k >= 1; --k) R = R && (text == pattern); if (R) { result = (text+i - m) + 1; break; } } free(R); return result; }

Bitap distinguishes itself from other well-known string searching algorithms in its natural mapping onto simple bitwise operations, as in the following modification of the above program. Notice that in this implementation, counterintuitively, each bit with value zero indicates a match, and each bit with value 1 indicates a non-match. The same algorithm can be written with the intuitive semantics for 0 and 1, but in that case we must introduce another instruction into the inner loop to set R |= 1. In this implementation, we take advantage of the fact that left-shifting a value shifts in zeros on the right, which is precisely the behavior we need.

Notice also that we require CHAR_MAX additional bitmasks in order to convert the (text == pattern) condition in the general implementation into bitwise operations. Therefore, the bitap algorithm performs better when applied to inputs over smaller alphabets.

#include #include const char *bitap_bitwise_search(const char *text, const char *pattern) { int m = strlen(pattern); unsigned long R; unsigned long pattern_mask; int i; if (pattern == '\0') return text; if (m > 31) return "The pattern is too long!"; /* Initialize the bit array R */ R = ~1; /* Initialize the pattern bitmasks */ for (i=0; i <= CHAR_MAX; ++i) pattern_mask = ~0; for (i=0; i < m; ++i) pattern_mask] &= ~(1UL << i); for (i=0; text != '\0'; ++i) { /* Update the bit array */ R |= pattern_mask]; R <<= 1; if (0 == (R & (1UL << m))) return (text + i - m) + 1; } return NULL; }

Read more about this topic:  Bitap Algorithm

Famous quotes containing the words exact and/or searching:

    I think that cars today are almost the exact equivalent of the great Gothic cathedrals: I mean the supreme creation of an era, conceived with passion by unknown artists, and consumed in image if not in usage by a whole population which appropriates them as a purely magical object.
    Roland Barthes (1915–1980)

    Through searching out origins, one becomes a crab. The historian looks backwards, and finally he also believes backwards.
    Friedrich Nietzsche (1844–1900)