Bitap Algorithm - Exact Searching

Exact Searching

The bitap algorithm for exact string searching, in full generality, looks like this when implemented in C:

#include #include typedef char BIT; /* needs only to hold the values 0 and 1 */ const char *bitap_search(const char *text, const char *pattern) { const char *result = NULL; int m = strlen(pattern); BIT *R; int i, k; if (pattern == '\0') return text; /* Initialize the bit array R */ R = malloc((m+1) * sizeof *R); R = 1; for (k=1; k <= m; ++k) R = 0; for (i=0; text != '\0'; ++i) { /* Update the bit array. */ for (k=m; k >= 1; --k) R = R && (text == pattern); if (R) { result = (text+i - m) + 1; break; } } free(R); return result; }

Bitap distinguishes itself from other well-known string searching algorithms in its natural mapping onto simple bitwise operations, as in the following modification of the above program. Notice that in this implementation, counterintuitively, each bit with value zero indicates a match, and each bit with value 1 indicates a non-match. The same algorithm can be written with the intuitive semantics for 0 and 1, but in that case we must introduce another instruction into the inner loop to set R |= 1. In this implementation, we take advantage of the fact that left-shifting a value shifts in zeros on the right, which is precisely the behavior we need.

Notice also that we require CHAR_MAX additional bitmasks in order to convert the (text == pattern) condition in the general implementation into bitwise operations. Therefore, the bitap algorithm performs better when applied to inputs over smaller alphabets.

#include #include const char *bitap_bitwise_search(const char *text, const char *pattern) { int m = strlen(pattern); unsigned long R; unsigned long pattern_mask; int i; if (pattern == '\0') return text; if (m > 31) return "The pattern is too long!"; /* Initialize the bit array R */ R = ~1; /* Initialize the pattern bitmasks */ for (i=0; i <= CHAR_MAX; ++i) pattern_mask = ~0; for (i=0; i < m; ++i) pattern_mask] &= ~(1UL << i); for (i=0; text != '\0'; ++i) { /* Update the bit array */ R |= pattern_mask]; R <<= 1; if (0 == (R & (1UL << m))) return (text + i - m) + 1; } return NULL; }

Read more about this topic:  Bitap Algorithm

Famous quotes containing the words exact and/or searching:

    Danger lies in the writer becoming the victim of his own exaggeration, losing the exact notion of sincerity, and in the end coming to despise truth itself as something too cold, too blunt for his purpose—as, in fact, not good enough for his insistent emotion. From laughter and tears the descent is easy to snivelling and giggles.
    Joseph Conrad (1857–1924)

    Our graves that hide us from the searching sun
    Are like drawn curtains when the play is done.
    Thus march we, playing, to our latest rest,
    Only, we die in earnest—that’s no jest.
    Sir Walter Raleigh (1552?–1618)