Longest Common Substring Problem - Pseudocode

Pseudocode

The following pseudocode finds the set of longest common substrings between two strings with dynamic programming:

function LCSubstr(S, T) L := array(1..m, 1..n) z := 0 ret := {} for i := 1..m for j := 1..n if S = T if i = 1 or j = 1 L := 1 else L := L + 1 if L > z z := L ret := {S} if L = z ret := ret ∪ {S} else L=0; return ret

This algorithm runs in time. The variable z is used to hold the length of the longest common substring found so far. The set ret is used to hold the set of strings which are of length z. The set ret can be saved efficiently by just storing the index i, which is the last character of the longest common substring (of size z) instead of S. Thus all the longest common substrings would be, for each i in ret, S-z)..(ret)].

The following tricks can be used to reduce the memory usage of an implementation:

  • Keep only the last and current row of the DP table to save memory ( instead of )
  • Store only non-zero values in the rows. This can be done using hash tables instead of arrays. This is useful for large alphabets.

Read more about this topic:  Longest Common Substring Problem