Pseudocode
The following pseudocode finds the set of longest common substrings between two strings with dynamic programming:
function LCSubstr(S, T) L := array(1..m, 1..n) z := 0 ret := {} for i := 1..m for j := 1..n if S = T if i = 1 or j = 1 L := 1 else L := L + 1 if L > z z := L ret := {S} if L = z ret := ret ∪ {S} else L=0; return retThis algorithm runs in time. The variable z
is used to hold the length of the longest common substring found so far. The set ret
is used to hold the set of strings which are of length z
. The set ret
can be saved efficiently by just storing the index i
, which is the last character of the longest common substring (of size z) instead of S
. Thus all the longest common substrings would be, for each i in ret
, S-z)..(ret)]
.
The following tricks can be used to reduce the memory usage of an implementation:
- Keep only the last and current row of the DP table to save memory ( instead of )
- Store only non-zero values in the rows. This can be done using hash tables instead of arrays. This is useful for large alphabets.
Read more about this topic: Longest Common Substring Problem