• 关于 levenshtein 的搜索结果

问题

如何在mysql中添加levenshtein函数?

保持可爱mmm 2020-05-11 11:23:34 1 浏览量 回答数 1

回答

可以使用Spark自己的levenshtein功能。此函数需要两个字符串进行比较,因此不能与数组一起使用。// creating window - ordered by IDval window = Window.orderBy("id")// using the window with lag function to compare to previous value in each columndf.withColumn("edit-d", levenshtein(($"name"), lag("name", 1).over(window)) + levenshtein(($"surname"), lag("surname", 1).over(window))).show()给出所需的输出:idnamesurnamearrayedit-d1AABB[AA, BB]null2AABB[AA, BB]03ABBB[AB, BB]1

社区小助手 2019-12-02 01:47:59 0 浏览量 回答数 0

问题

Levenshtein距离的实现为MySQL /模糊搜索?

保持可爱mmm 2020-05-11 10:51:16 0 浏览量 回答数 1

新用户福利专场,云服务器ECS低至102元/年

新用户专场,1核2G 102元/年起,2核4G 699.8元/年起

回答

我已经连接到我的MySQL服务器,并在MySQL Workbench中简单地执行了该语句,并且该语句非常有效-我现在有了新功能。levenshtein() 例如,这按预期工作: SELECT levenshtein('abcde', 'abced') 2来源:stack overflow

保持可爱mmm 2020-05-11 11:23:45 0 浏览量 回答数 0

回答

为了使用levenshtein距离进行有效搜索,您需要高效的专业索引,例如bk-tree。不幸的是,我所知没有一个数据库系统(包括MySQL)实现bk-tree索引。如果您要查找全文搜索,而不是每行仅搜索一个单词,则情况将更加复杂。临时而言,我想不出以允许基于levenshtein距离进行搜索的方式进行全文索引的任何方式。来源:stack overflow

保持可爱mmm 2020-05-11 10:51:28 0 浏览量 回答数 0

回答

您可以使用此功能(从http://www.artfulsoftware.com/infotree/queries.php#552改编的cop ^ H ^ H ^ ): CREATE FUNCTION levenshtein( s1 text, s2 text) RETURNS int(11) DETERMINISTIC BEGIN DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT; DECLARE s1_char CHAR; DECLARE cv0, cv1 text; SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0; IF s1 = s2 THEN RETURN 0; ELSEIF s1_len = 0 THEN RETURN s2_len; ELSEIF s2_len = 0 THEN RETURN s1_len; ELSE WHILE j <= s2_len DO SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1; END WHILE; WHILE i <= s1_len DO SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1; WHILE j <= s2_len DO SET c = c + 1; IF s1_char = SUBSTRING(s2, j, 1) THEN SET cost = 0; ELSE SET cost = 1; END IF; SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost; IF c > c_temp THEN SET c = c_temp; END IF; SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1; IF c > c_temp THEN SET c = c_temp; END IF; SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1; END WHILE; SET cv1 = cv0, i = i + 1; END WHILE; END IF; RETURN c; END 并以XX%的价格使用此功能 CREATE FUNCTION levenshtein_ratio( s1 text, s2 text ) RETURNS int(11) DETERMINISTIC BEGIN DECLARE s1_len, s2_len, max_len INT; SET s1_len = LENGTH(s1), s2_len = LENGTH(s2); IF s1_len > s2_len THEN SET max_len = s1_len; ELSE SET max_len = s2_len; END IF; RETURN ROUND((1 - LEVENSHTEIN(s1, s2) / max_len) * 100); END来源:stack overflow

保持可爱mmm 2020-05-16 22:01:29 0 浏览量 回答数 0

问题

10个你可能从未用过的PHP函数:报错

kun坤 2020-06-12 22:13:10 0 浏览量 回答数 1

问题

10个你可能从未用过的PHP函数:配置报错 

kun坤 2020-05-31 19:10:38 0 浏览量 回答数 1

问题

是否有任何内置的panda操作可以找到两个不同数据框的相似列?

kun坤 2019-12-25 21:59:40 2 浏览量 回答数 0

问题

用SQL清理数据

游客syccbxcrjoo2g 2019-12-01 22:07:08 5 浏览量 回答数 0

问题

Python 爬虫的工具列表

驻云科技 2019-12-01 21:44:42 4079 浏览量 回答数 2

回答

这是我为几乎相同的堆栈编写的(我们需要标准化硬件的制造商名称,并且有各种各样的变体)。不过,这是客户端(准确地说是VB.Net)-并使用Levenshtein距离算法(已修改,以获得更好的结果): Public Shared Function FindMostSimilarString(ByVal toFind As String, ByVal ParamArray stringList() As String) As String Dim bestMatch As String = "" Dim bestDistance As Integer = 1000 'Almost anything should be better than that! For Each matchCandidate As String In stringList Dim candidateDistance As Integer = LevenshteinDistance(toFind, matchCandidate) If candidateDistance < bestDistance Then bestMatch = matchCandidate bestDistance = candidateDistance End If Next Return bestMatch End Function 'This will be used to determine how similar strings are. Modified from the link below... 'Fxn from: http://ca0v.terapad.com/index.cfm?fa=contentNews.newsDetails&newsID=37030&from=list Public Shared Function LevenshteinDistance(ByVal s As String, ByVal t As String) As Integer Dim sLength As Integer = s.Length ' length of s Dim tLength As Integer = t.Length ' length of t Dim lvCost As Integer ' cost Dim lvDistance As Integer = 0 Dim zeroCostCount As Integer = 0 Try ' Step 1 If tLength = 0 Then Return sLength ElseIf sLength = 0 Then Return tLength End If Dim lvMatrixSize As Integer = (1 + sLength) * (1 + tLength) Dim poBuffer() As Integer = New Integer(0 To lvMatrixSize - 1) {} ' fill first row For lvIndex As Integer = 0 To sLength poBuffer(lvIndex) = lvIndex Next 'fill first column For lvIndex As Integer = 1 To tLength poBuffer(lvIndex * (sLength + 1)) = lvIndex Next For lvRowIndex As Integer = 0 To sLength - 1 Dim s_i As Char = s(lvRowIndex) For lvColIndex As Integer = 0 To tLength - 1 If s_i = t(lvColIndex) Then lvCost = 0 zeroCostCount += 1 Else lvCost = 1 End If ' Step 6 Dim lvTopLeftIndex As Integer = lvColIndex * (sLength + 1) + lvRowIndex Dim lvTopLeft As Integer = poBuffer(lvTopLeftIndex) Dim lvTop As Integer = poBuffer(lvTopLeftIndex + 1) Dim lvLeft As Integer = poBuffer(lvTopLeftIndex + (sLength + 1)) lvDistance = Math.Min(lvTopLeft + lvCost, Math.Min(lvLeft, lvTop) + 1) poBuffer(lvTopLeftIndex + sLength + 2) = lvDistance Next Next Catch ex As ThreadAbortException Err.Clear() Catch ex As Exception WriteDebugMessage(Application.StartupPath , [Assembly].GetExecutingAssembly().GetName.Name.ToString, MethodBase.GetCurrentMethod.Name, Err) End Try Return lvDistance - zeroCostCount End Function

心有灵_夕 2019-12-29 12:50:04 0 浏览量 回答数 0

问题

你可能不知道的 Python 技巧有哪些?

游客bnlxddh3fwntw 2020-04-13 11:34:27 33 浏览量 回答数 1
阿里云大学 云服务器ECS com域名 网站域名whois查询 开发者平台 小程序定制 小程序开发 国内短信套餐包 开发者技术与产品 云数据库 图像识别 开发者问答 阿里云建站 阿里云备案 云市场 万网 阿里云帮助文档 免费套餐 开发者工具 SQL审核 小程序开发制作 视频内容分析 企业网站制作 视频集锦 代理记账服务 2020阿里巴巴研发效能峰会 企业建站模板 云效成长地图 高端建站 人工智能 阿里云云栖号 云栖号案例 云栖号直播