Bitap Algorithm - Fuzzy Searching

Fuzzy Searching

To perform fuzzy string searching using the bitap algorithm, it is necessary to extend the bit array R into a second dimension. Instead of having a single array R that changes over the length of the text, we now have k distinct arrays R1..k. Array Ri holds a representation of the prefixes of pattern that match any suffix of the current string with i or fewer errors. In this context, an "error" may be an insertion, deletion, or substitution; see Levenshtein distance for more information on these operations.

The implementation below performs fuzzy matching (returning the first match with up to k errors) using the fuzzy bitap algorithm. However, it only pays attention to substitutions, not to insertions or deletions — in other words, a Hamming distance of k. As before, the semantics of 0 and 1 are reversed from their intuitive meanings.

#include #include #include const char *bitap_fuzzy_bitwise_search(const char *text, const char *pattern, int k) { const char *result = NULL; int m = strlen(pattern); unsigned long *R; unsigned long pattern_mask; int i, d; if (pattern == '\0') return text; if (m > 31) return "The pattern is too long!"; /* Initialize the bit array R */ R = malloc((k+1) * sizeof *R); for (i=0; i <= k; ++i) R = ~1; /* Initialize the pattern bitmasks */ for (i=0; i <= CHAR_MAX; ++i) pattern_mask = ~0; for (i=0; i < m; ++i) pattern_mask] &= ~(1UL << i); for (i=0; text != '\0'; ++i) { /* Update the bit arrays */ unsigned long old_Rd1 = R; R |= pattern_mask]; R <<= 1; for (d=1; d <= k; ++d) { unsigned long tmp = R; /* Substitution is all we care about */ R = (old_Rd1 & (R | pattern_mask])) << 1; old_Rd1 = tmp; } if (0 == (R & (1UL << m))) { result = (text+i - m) + 1; break; } } free(R); return result; }

Read more about this topic:  Bitap Algorithm

Famous quotes containing the words fuzzy and/or searching:

    Even their song is not a sure thing.
    It is not a language;
    it is a kind of breathing.
    They are two asthmatics
    whose breath sobs in and out
    through a small fuzzy pipe.
    Anne Sexton (1928–1974)

    And I cannot find the place
    Where his paw is the snare!
    Little One! Oh, Little One!
    I am searching everywhere!
    James Kenneth Stephens (1882–1950)