前言
匹配算法的瓶颈之一在于如何判断词典中是否含有字符串,如果用有序集合(TreeMap)复杂度是O(logn),如果用散列列表(java的Hashmap或者Python的dict),时间复杂度降低,但是内存复杂度上升
所以需要自行设计算法
一、字典树是什么?
字典树(Trie、前缀树)和后缀树是单词处理的树形数据结构。
约定用None表示节点不对应词语,这样不能插入None值了,但是实现起来更简洁
二、实现代码
1.代码展示
代码如下(示例):
# -*- coding:utf-8 -*- class Node(object): def __init__(self, value) -> None: self._children = {} self._value = value def _add_child(self, char, value, overwrite=False): child = self._children.get(char) if child is None: child = Node(value) self._children[char] = child elif overwrite: child._value = value return child class Trie(Node): def __init__(self) -> None: super().__init__(None) def __contains__(self, key): return self[key] is not None def __getitem__(self, key): state = self for char in key: state = state._children.get(char) if state is None: return None return state._value def __setitem__(self, key, value): state = self for i, char in enumerate(key): if i < len(key) - 1: state = state._add_child(char, None, False) else: state = state._add_child(char, value, True)
2.测试代码
代码如下:
if __name__ == '__main__': trie = Trie() # 增 trie['自然'] = 'nature' trie['自然人'] = 'human' trie['自然语言'] = 'language' trie['自语'] = 'talk to oneself' trie['入门'] = 'introduction' assert '自然' in trie # 删 trie['自然'] = None assert '自然' not in trie # 改 trie['自然语言'] = 'human language' assert trie['自然语言'] == 'human language' # 查 assert trie['入门'] == 'introduction'
实测
运行效果展示
简单来说,没有想象的那么复杂,通过设计一个前缀树,随着路径的深入,前缀匹配是递进的过程,算法不必比较字符串的前缀,所以就会相对较快一些。