Bitap Algorithm - Exact Searching

Exact Searching

The bitap algorithm for exact string searching, in full generality, looks like this when implemented in C:

#include #include typedef char BIT; /* needs only to hold the values 0 and 1 */ const char *bitap_search(const char *text, const char *pattern) { const char *result = NULL; int m = strlen(pattern); BIT *R; int i, k; if (pattern == '\0') return text; /* Initialize the bit array R */ R = malloc((m+1) * sizeof *R); R = 1; for (k=1; k <= m; ++k) R = 0; for (i=0; text != '\0'; ++i) { /* Update the bit array. */ for (k=m; k >= 1; --k) R = R && (text == pattern); if (R) { result = (text+i - m) + 1; break; } } free(R); return result; }

Bitap distinguishes itself from other well-known string searching algorithms in its natural mapping onto simple bitwise operations, as in the following modification of the above program. Notice that in this implementation, counterintuitively, each bit with value zero indicates a match, and each bit with value 1 indicates a non-match. The same algorithm can be written with the intuitive semantics for 0 and 1, but in that case we must introduce another instruction into the inner loop to set R |= 1. In this implementation, we take advantage of the fact that left-shifting a value shifts in zeros on the right, which is precisely the behavior we need.

Notice also that we require CHAR_MAX additional bitmasks in order to convert the (text == pattern) condition in the general implementation into bitwise operations. Therefore, the bitap algorithm performs better when applied to inputs over smaller alphabets.

#include #include const char *bitap_bitwise_search(const char *text, const char *pattern) { int m = strlen(pattern); unsigned long R; unsigned long pattern_mask; int i; if (pattern == '\0') return text; if (m > 31) return "The pattern is too long!"; /* Initialize the bit array R */ R = ~1; /* Initialize the pattern bitmasks */ for (i=0; i <= CHAR_MAX; ++i) pattern_mask = ~0; for (i=0; i < m; ++i) pattern_mask] &= ~(1UL << i); for (i=0; text != '\0'; ++i) { /* Update the bit array */ R |= pattern_mask]; R <<= 1; if (0 == (R & (1UL << m))) return (text + i - m) + 1; } return NULL; }

Read more about this topic:  Bitap Algorithm

Famous quotes containing the words exact and/or searching:

    In the most desirable conditions, the child learns to manage anxiety by being exposed to just the right amounts of it, not much more and not much less. This optimal amount of anxiety varies with the child’s age and temperament. It may also vary with cultural values.... There is no mathematical formula for calculating exact amounts of optimal anxiety. This is why child rearing is an art and not a science.
    Alicia F. Lieberman (20th century)

    A government deriving its energy from the will of the society, and operating, by the reason of its measures, on the understanding and interest of the society ... is the government for which philosophy has been searching and humanity been fighting from the most remote ages ... which it is the glory of America to have invented, and her unrivalled happiness to possess.
    James Madison (1751–1836)