Bitap Algorithm - Fuzzy Searching

Fuzzy Searching

To perform fuzzy string searching using the bitap algorithm, it is necessary to extend the bit array R into a second dimension. Instead of having a single array R that changes over the length of the text, we now have k distinct arrays R1..k. Array Ri holds a representation of the prefixes of pattern that match any suffix of the current string with i or fewer errors. In this context, an "error" may be an insertion, deletion, or substitution; see Levenshtein distance for more information on these operations.

The implementation below performs fuzzy matching (returning the first match with up to k errors) using the fuzzy bitap algorithm. However, it only pays attention to substitutions, not to insertions or deletions — in other words, a Hamming distance of k. As before, the semantics of 0 and 1 are reversed from their intuitive meanings.

#include #include #include const char *bitap_fuzzy_bitwise_search(const char *text, const char *pattern, int k) { const char *result = NULL; int m = strlen(pattern); unsigned long *R; unsigned long pattern_mask; int i, d; if (pattern == '\0') return text; if (m > 31) return "The pattern is too long!"; /* Initialize the bit array R */ R = malloc((k+1) * sizeof *R); for (i=0; i <= k; ++i) R = ~1; /* Initialize the pattern bitmasks */ for (i=0; i <= CHAR_MAX; ++i) pattern_mask = ~0; for (i=0; i < m; ++i) pattern_mask] &= ~(1UL << i); for (i=0; text != '\0'; ++i) { /* Update the bit arrays */ unsigned long old_Rd1 = R; R |= pattern_mask]; R <<= 1; for (d=1; d <= k; ++d) { unsigned long tmp = R; /* Substitution is all we care about */ R = (old_Rd1 & (R | pattern_mask])) << 1; old_Rd1 = tmp; } if (0 == (R & (1UL << m))) { result = (text+i - m) + 1; break; } } free(R); return result; }

Read more about this topic:  Bitap Algorithm

Famous quotes containing the words fuzzy and/or searching:

    What do you think of us in fuzzy endeavor, you whose directions are sterling, whose lunge is straight?
    Can you make a reason, how can you pardon us who memorize the rules and never score?
    Gwendolyn Brooks (b. 1917)

    Our graves that hide us from the searching sun
    Are like drawn curtains when the play is done.
    Thus march we, playing, to our latest rest,
    Only, we die in earnest—that’s no jest.
    Sir Walter Raleigh (1552?–1618)