KMP Algorithm
Intuition
Assume we have a string BBCABCDABABCDABCDABDE,</p>
we want to check it there is a substring ABCDABD
i denotes the index of string, j denotes the index of pattern
| Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| i | ↓ | ||||||||||||||||||||||
| String | B | B | C | A | B | C | D | A | B | A | B | C | D | A | B | C | D | A | B | D | E | ||
| pattern | A | B | C | D | A | B | D | ||||||||||||||||
| patternIndex | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ||||||||||||||||
| j | ↑ | ||||||||||||||||||||||
| pattern | A | B | C | D | A | B | D | ||||||||||||||||
| patternIndex | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ||||||||||||||||
| next j | ↑ |
After we start matching iStart = 3, we can move to iStart to 4 intuitively.
However, ideally, we can virtual move iStart to 7, comparing i = 9 and j = 2 since we alreay compared i = 7, j = 4 and i = 8, j = 5
Thus, we can create a int[] next denotes that where j should go when j mismatches (s.charAt(i) != p.charAt(j))
Implementation
For pattern string mentioned above, we can have the following table.
| pattern string | A | B | C | D | A | B | D |
|---|---|---|---|---|---|---|---|
| patternIndex | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
| length of the common prefix and suffix | 0 | 0 | 0 | 0 | 1 | 2 | 0 |
| next[] | -1 | 0 | 0 | 0 | 0 | 1 | 2 |
Thus, when a mismatch happens on j = 6, j should be set to next[j]
Denote k to the length of common suffix and prefix of we already have in string, which indicates that p[0, 1, 2, ..., k - 1] equals p[j - k, j - k + 1, ..., j - 1]
- If
p[k] == p[j],kcan be increased by 1.next[j + 1]will bek - If not,
k = next[k]
next[]
static int[] getNext(final String pattern) {
final int[] next = new int[pattern.length()];
if (pattern.length() == 0) {
return next;
}
next[0] = -1;
int k = -1, j = 0;
while (j < pattern.length() - 1) {
if (k == -1 || pattern.charAt(j) == pattern.charAt(k)) {
++k;
++j;
next[j] = k;
} else {
k = next[k];
}
}
return next;
}
substring()
在KMP模板,需要严格遵守 int i = 0, j = 0,并在外面比较 -> 否则,"" and "" 这个case过不去
public int substring(final String s, final String p) {
final int[] next = getNext(p);
final int sLen = s.length(), pLen = p.length();
int i = 0, j = 0;
while (i < sLen && j < pLen) {
if (j == -1 || s.charAt(i) == p.charAt(j)) {
// when mismatch
++i;
++j;
} else {
// when match
j = next[j];
}
}
if (j == pLen) {
return i - j;
} else {
return -1;
}
}
Reference
http://wiki.jikexueyuan.com/project/kmp-algorithm/define.html