Bitap Algorithm - Fuzzy Searching

Fuzzy Searching

To perform fuzzy string searching using the bitap algorithm, it is necessary to extend the bit array R into a second dimension. Instead of having a single array R that changes over the length of the text, we now have k distinct arrays R1..k. Array Ri holds a representation of the prefixes of pattern that match any suffix of the current string with i or fewer errors. In this context, an "error" may be an insertion, deletion, or substitution; see Levenshtein distance for more information on these operations.

The implementation below performs fuzzy matching (returning the first match with up to k errors) using the fuzzy bitap algorithm. However, it only pays attention to substitutions, not to insertions or deletions — in other words, a Hamming distance of k. As before, the semantics of 0 and 1 are reversed from their intuitive meanings.

#include #include #include const char *bitap_fuzzy_bitwise_search(const char *text, const char *pattern, int k) { const char *result = NULL; int m = strlen(pattern); unsigned long *R; unsigned long pattern_mask; int i, d; if (pattern == '\0') return text; if (m > 31) return "The pattern is too long!"; /* Initialize the bit array R */ R = malloc((k+1) * sizeof *R); for (i=0; i <= k; ++i) R = ~1; /* Initialize the pattern bitmasks */ for (i=0; i <= CHAR_MAX; ++i) pattern_mask = ~0; for (i=0; i < m; ++i) pattern_mask] &= ~(1UL << i); for (i=0; text != '\0'; ++i) { /* Update the bit arrays */ unsigned long old_Rd1 = R; R |= pattern_mask]; R <<= 1; for (d=1; d <= k; ++d) { unsigned long tmp = R; /* Substitution is all we care about */ R = (old_Rd1 & (R | pattern_mask])) << 1; old_Rd1 = tmp; } if (0 == (R & (1UL << m))) { result = (text+i - m) + 1; break; } } free(R); return result; }

Read more about this topic:  Bitap Algorithm

Famous quotes containing the words fuzzy and/or searching:

    What do you think of us in fuzzy endeavor, you whose directions are sterling, whose lunge is straight?
    Can you make a reason, how can you pardon us who memorize the rules and never score?
    Gwendolyn Brooks (b. 1917)

    I have been searching history to see if really a woman has any precedent to claim the right to have her rights, and I am compelled to say that we men are not so much ahead of women after all, and the only way we have kept our reputation up is by keeping her down—and don’t you forget it!
    George E. Foster, U.S. women’s magazine contributor. The Woman’s Magazine, pp. 38-41 (October 1886)