[LeetCode] Design Search Autocomplete System 设计搜索自动补全系统

简介:

Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character '#'). For each character they type except '#', you need to return the top 3historical hot sentences that have prefix the same as the part of sentence already typed. Here are the specific rules:

  1. The hot degree for a sentence is defined as the number of times a user typed the exactly same sentence before.
  2. The returned top 3 hot sentences should be sorted by hot degree (The first is the hottest one). If several sentences have the same degree of hot, you need to use ASCII-code order (smaller one appears first).
  3. If less than 3 hot sentences exist, then just return as many as you can.
  4. When the input is a special character, it means the sentence ends, and in this case, you need to return an empty list.

Your job is to implement the following functions:

The constructor function:

AutocompleteSystem(String[] sentences, int[] times): This is the constructor. The input is historical data. Sentences is a string array consists of previously typed sentences. Times is the corresponding times a sentence has been typed. Your system should record these historical data.

Now, the user wants to input a new sentence. The following function will provide the next character the user types:

List<String> input(char c): The input c is the next character typed by the user. The character will only be lower-case letters ('a' to 'z'), blank space (' ') or a special character ('#'). Also, the previously typed sentence should be recorded in your system. The output will be the top 3 historical hot sentences that have prefix the same as the part of sentence already typed.

Example:

Operation: AutocompleteSystem(["i love you", "island","ironman", "i love leetcode"], [5,3,2,2]) 
The system have already tracked down the following sentences and their corresponding times: 
"i love you" : 5 times 
"island" : 3 times 
"ironman" : 2 times 
"i love leetcode" : 2 times 
Now, the user begins another search: 
Operation: input('i') 
Output: ["i love you", "island","i love leetcode"] 
Explanation: 
There are four sentences that have prefix "i". Among them, "ironman" and "i love leetcode" have same hot degree. Since ' ' has ASCII code 32 and 'r' has ASCII code 114, "i love leetcode" should be in front of "ironman". Also we only need to output top 3 hot sentences, so "ironman" will be ignored. 
Operation: input(' ') 
Output: ["i love you","i love leetcode"] 
Explanation: 
There are only two sentences that have prefix "i ". 
Operation: input('a') 
Output: [] 
Explanation: 
There are no sentences that have prefix "i a". 
Operation: input('#') 
Output: [] 
Explanation: 
The user finished the input, the sentence "i a" should be saved as a historical sentence in system. And the following input will be counted as a new search. 

Note:

    1. The input sentence will always start with a letter and end with '#', and only one blank space will exist between two words.
    2. The number of complete sentences that to be searched won't exceed 100. The length of each sentence including those in the historical data won't exceed 100.
    3. Please use double-quote instead of single-quote when you write test cases even for a character input.
    4. Please remember to RESET your class variables declared in class AutocompleteSystem, as static/class variables are persisted across multiple test cases. Please see here for more details.

这道题让我们实现一个简单的搜索自动补全系统,我们用谷歌或者百度进行搜索时,会有这样的体验,输入些单词,搜索框会弹出一些以你输入为开头的一些完整的句子供你选择,这就是一种搜索自动补全系统。根据题目的要求,补全的句子是按之前出现的频率排列的,高频率的出现在最上面,如果频率相同,就按字母顺序来显示。输入规则是每次输入一个字符,然后返回自动补全的句子,如果遇到井字符,表示完整句子结束。那么我们肯定需要一个哈希map,建立句子和其出现频率的映射,还需要一个字符串data,用来保存之前输入过的字符。在构造函数中,给了我们一些句子,和其出现的次数,那么我们就直接将其加入哈希map,然后data初始化为空字符串。在input函数中,我们首先判读输入字符是否为井字符,如果是的话,那么表明当前的data字符串已经是一个完整的句子,在哈希表中次数加1,并且data清空,返回空集。否则的话我们将当前字符加入data字符串中,现在就要找出包含data前缀的前三高频句子了,我们使用优先队列来做,设计的思路是,始终用优先队列保存频率最高的三个句子,那么我们就应该把频率低的或者是字母顺序大的放在队首,以便随时可以移出队列,所以应该是个最小堆,队列里放句子和其出现频率的pair,并且根据其频率大小进行排序,所以我们要重写优先队列的comparator。然后我们遍历哈希表中的所有句子,我们首先要验证当前data字符串是否是其前缀,没啥好的方法,就逐个字符比较,用标识符matched,初始化为true,如果发现不匹配,则matched标记为false,并break掉。然后判断如果matched为true的话,说明data字符串是前缀,那么就把这个pair加入优先队列中,如果此时队列中的元素大于三个,那把队首元素移除,因为我们设计的是最小堆,所以频率小的句子会被先移除。然后就是将优先队列的元素加到结果res中,由于先出队列的是频率小的句子,所以要加到结果res的末尾,参见代码如下:

public:
    AutocompleteSystem(vector<string> sentences, vector<int> times) {
        for (int i = 0; i < sentences.size(); ++i) {
            freq[sentences[i]] += times[i]; 
        }
        data = "";
    }
    
    vector<string> input(char c) {
        if (c == '#') {
            ++freq[data];
            data = "";
            return {};
        }
        data.push_back(c);
        auto cmp = [](pair<string, int>& a, pair<string, int>& b) {
            return a.second > b.second || (a.second == b.second && a.first < b.first);
        };
        priority_queue<pair<string, int>, vector<pair<string, int>>, decltype(cmp) > q(cmp);
        for (auto f : freq) {
            bool matched = true;
            for (int i = 0; i < data.size(); ++i) {
                if (data[i] != f.first[i]) {
                    matched = false;
                    break;
                }
            }
            if (matched) {
                q.push(f);
                if (q.size() > 3) q.pop();
            }
        }
        vector<string> res(q.size());
        for (int i = q.size() - 1; i >= 0; --i) {
            res[i] = q.top().first; q.pop();
        }
        return res;
    }
    
private:
    unordered_map<string, int> freq;
    string data;
};

参考资料:

https://discuss.leetcode.com/topic/96090/straight-forward-hash-table-priority-queue-solution-in-c-no-trie

本文转自博客园Grandyang的博客,原文链接:[LeetCode] Design Search Autocomplete System 设计搜索自动补全系统

,如需转载请自行联系原博主。

相关文章
|
2月前
|
算法 索引
LeetCode(搜索插入位置)
如何使用二分查找算法来解决LeetCode上的“搜索插入位置”问题,确保时间复杂度为O(log n),并提供了详细的代码实现和分析。
15 2
|
2月前
|
索引
Leetcode第三十三题(搜索旋转排序数组)
这篇文章介绍了解决LeetCode第33题“搜索旋转排序数组”的方法,该问题要求在旋转过的升序数组中找到给定目标值的索引,如果存在则返回索引,否则返回-1,文章提供了一个时间复杂度为O(logn)的二分搜索算法实现。
24 0
Leetcode第三十三题(搜索旋转排序数组)
|
2月前
【LeetCode 39】700.二叉搜索树中的搜索
【LeetCode 39】700.二叉搜索树中的搜索
16 0
|
4月前
|
算法
LeetCode第81题搜索旋转排序数组 II
文章讲解了LeetCode第81题"搜索旋转排序数组 II"的解法,通过二分查找算法并加入去重逻辑来解决在旋转且含有重复元素的数组中搜索特定值的问题。
LeetCode第81题搜索旋转排序数组 II
|
4月前
|
算法
LeetCode第74题搜索二维矩阵
文章讲解了LeetCode第74题"搜索二维矩阵"的解决方案,利用二分搜索法将问题简化,并通过数学转换找到二维矩阵中的对应元素,展示了将二维问题转化为一维问题的解题技巧。
LeetCode第74题搜索二维矩阵
|
4月前
|
算法
LeetCode第35题搜索插入位置
这篇文章介绍了LeetCode第35题"搜索插入位置"的解题方法,通过使用二分查找法,高效地找到在有序数组中插入一个目标数的最佳位置。
LeetCode第35题搜索插入位置
|
4月前
|
算法
LeetCode第33题搜索旋转排序数组
这篇文章介绍了LeetCode第33题"搜索旋转排序数组"的解题方法,通过使用二分查找法并根据数组的有序性质调整搜索范围,实现了时间复杂度为O(log n)的高效搜索算法。
LeetCode第33题搜索旋转排序数组
|
4月前
|
算法 JavaScript Python
【Leetcode刷题Python】79. 单词搜索和剑指 Offer 12. 矩阵中的路径
Leetcode第79题"单词搜索"的Python解决方案,使用回溯算法在给定的二维字符网格中搜索单词,判断单词是否存在于网格中。
52 4
|
4月前
|
算法 Python
【Leetcode刷题Python】74. 搜索二维矩阵
两种解决LeetCode "搜索二维矩阵" 问题的方法的Python实现。第一种方法是从二维矩阵的右上角开始线性搜索,通过比较当前元素与目标值来决定搜索方向。第二种方法是将二维矩阵视为一维数组进行二分查找,通过计算中间元素的行列索引来更新搜索区间。两种方法都旨在高效地判断目标值是否存在于给定的有序二维矩阵中。
45 0
|
4月前
|
算法 索引 Python
【Leetcode刷题Python】33. 搜索旋转排序数组
解决LeetCode "搜索旋转排序数组" 问题的Python实现代码。代码使用了二分查找算法,首先检查目标值是否存在于数组中,然后通过比较数组中间值与数组首尾值来确定应该在数组的哪一半继续搜索,直到找到目标值或搜索范围为空。如果找到目标值,返回其索引;如果搜索结束仍未找到,返回 -1。
20 0