Rabin-Karp 算法
Rabin-Karp是一种基于Hash的高效的字符串搜索算法
问题
给定长度为n的字符串s (文本串),长度为m的字符串t (模式串)
求t是否在s中出现过(t是否为s的子串)
朴素: 0(nm)
Rabin-Karp算法: 0(n + m)
思路:
计算s的每个长度为m的子串的Hash值(-一个宽度为m的滑动窗口滑过s)
检查是否与t的Hash值相等
选用的Hash函数:
把字符串看作一个b进制数(一个多项式) ,计算它(在十进制下)对p取模的值
举例:
取b= 131, p= 2^64
字符串foobar的Hash值为(a=1, b=2, f=6, o=15, r=18)
(61315 + 151314 + 15* 1313+2 1312+ 1131 + 18) mod 264
选取的b和p的值决定了Hash函数的质量
根据经验,b=131, 13331等, p为大质数,冲突概率极小
Hash值相等时可以再比对一下两个字符串 ,避免Hash碰撞问题
如何快速计算一个子串的Hash值?
s = “foobar”
先计算6个前缀子串的Hash值, 0(n):
H[i] = Hash(s[0… i-1]) = (H[i-1]*b + s[i- 1]) mod p
计算子串oba的的Hash值:
相当于b进制下两个数做减法(H[5]-H[2]b3)
fooba
-fo000
oba
Hash(s[l…r]) = (H[r + 1]- H[l] b^r-l+1)mod p, 0(1)
实战
28.实现strStr()
https://leetcode.cn/problems/implement-strstr/
class Solution { public: int strStr(string haystack, string needle) { int b = 131, p = 1e9 + 7; int n = haystack.size(); int m = needle.size(); vector<long long> H(n + 1, 0); for(int i = 1; i <= n; i++){ H[i] = (H[i - 1] * b + (haystack[i - 1] - 'a' + 1)) % p; } long long Hneedle = 0; long long powBm = 1; for(char ch : needle) { Hneedle = (Hneedle * b + (ch - 'a' + 1)) % p; powBm = powBm * b % p; } for(int l = 1;l <= n - m + 1; l++){ int r = l + m -1; if(((H[r] - H[l - 1] * powBm) % p + p) % p == Hneedle){ return l - 1; } } return -1; } };
class Solution { public: int strStr(string haystack, string needle) { int b = 131; int n = haystack.size(); int m = needle.size(); vector<unsigned long long> H(n + 1, 0); for(int i = 1; i <= n; i++) H[i] = H[i - 1] * b + (haystack[i - 1] - 'a' + 1); unsigned long long Hneedle = 0; unsigned long long powBM = 1; for(char ch : needle) { Hneedle = Hneedle * b + (ch - 'a' + 1); powBM = powBM * b; } for (int l = 1; l <= n - m + 1; l++) { int r = l + m - 1; if(H[r] - H[l - 1] * powBM == Hneedle) return l - 1; } return -1; } };
推荐一个零声学院免费公开课程,个人觉得老师讲得不错,分享给大家:Linux,Nginx,ZeroMQ,MySQL,Redis,fastdfs,MongoDB,ZK,流媒体,CDN,P2P,K8S,Docker,TCP/IP,协程,DPDK等技术内容,立即学习