Bitap Algorithm - Fuzzy Searching

Fuzzy Searching

To perform fuzzy string searching using the bitap algorithm, it is necessary to extend the bit array R into a second dimension. Instead of having a single array R that changes over the length of the text, we now have k distinct arrays R1..k. Array Ri holds a representation of the prefixes of pattern that match any suffix of the current string with i or fewer errors. In this context, an "error" may be an insertion, deletion, or substitution; see Levenshtein distance for more information on these operations.

The implementation below performs fuzzy matching (returning the first match with up to k errors) using the fuzzy bitap algorithm. However, it only pays attention to substitutions, not to insertions or deletions — in other words, a Hamming distance of k. As before, the semantics of 0 and 1 are reversed from their intuitive meanings.

#include #include #include const char *bitap_fuzzy_bitwise_search(const char *text, const char *pattern, int k) { const char *result = NULL; int m = strlen(pattern); unsigned long *R; unsigned long pattern_mask; int i, d; if (pattern == '\0') return text; if (m > 31) return "The pattern is too long!"; /* Initialize the bit array R */ R = malloc((k+1) * sizeof *R); for (i=0; i <= k; ++i) R = ~1; /* Initialize the pattern bitmasks */ for (i=0; i <= CHAR_MAX; ++i) pattern_mask = ~0; for (i=0; i < m; ++i) pattern_mask] &= ~(1UL << i); for (i=0; text != '\0'; ++i) { /* Update the bit arrays */ unsigned long old_Rd1 = R; R |= pattern_mask]; R <<= 1; for (d=1; d <= k; ++d) { unsigned long tmp = R; /* Substitution is all we care about */ R = (old_Rd1 & (R | pattern_mask])) << 1; old_Rd1 = tmp; } if (0 == (R & (1UL << m))) { result = (text+i - m) + 1; break; } } free(R); return result; }

Read more about this topic:  Bitap Algorithm

Famous quotes containing the words fuzzy and/or searching:

    Even their song is not a sure thing.
    It is not a language;
    it is a kind of breathing.
    They are two asthmatics
    whose breath sobs in and out
    through a small fuzzy pipe.
    Anne Sexton (1928–1974)

    It is the essence of truth that it is never excessive. Why should it exaggerate? There is that which should be destroyed and that which should be simply illuminated and studied. How great is the force of benevolent and searching examination! We must not resort to the flame where only light is required.
    Victor Hugo (1802–1885)