[LeetCode] Repeated DNA Sequences

简介: This link has a great discussion about this problem. You may refer to it if you like. In fact, the idea and code in this passage is from the former link.

This link has a great discussion about this problem. You may refer to it if you like. In fact, the idea and code in this passage is from the former link.

Well, there is a very intuitive solution to this problem. That is, starting from the first letter of the string, extract a substring of length 10, check whether it has occurred and not been added to the result. If so, add it to the result; otherwise, visit the next letter and repeat the above process. However, a naive implementation of this idea will give the MLE error, and this is the real obstacle of the problem.

Then we need to save spaces. Instead of keeping the whole substring, can be convert it to other formats? Well, you have noticed that there are only letters A, T, C, G in the substring. If we assign each letter 2 bits, then a 10-letter substring will only cost 20 bits and can thus be accommodated by a 32-bit integer, greatly lowering the space complexity.

Then you may put this idea into code and get an simple Accepted solution as follows. Congratulations!

 1 class Solution {
 2 public:
 3     vector<string> findRepeatedDnaSequences(string s) {
 4         unordered_map<int, int> mp;
 5         vector<string> res;
 6         int i = 0, code = 0;
 7         while (i < 9)
 8             code = ((code << 2) | mapping(s[i++]));
 9         for (; i < (int)s.length(); i++) {
10             code = (((code << 2) & 0xfffff) | mapping(s[i]));
11             if (mp[code]++ == 1)
12                 res.push_back(s.substr(i - 9, 10));
13         }
14         return res;
15     }
16 private:
17     int mapping(char s) {
18         if (s == 'A') return 0;
19         if (s == 'C') return 1;
20         if (s == 'G') return 2;
21         if (s == 'T') return 3;
22     }
23 };

Do you see the logic in the above code? Well, we first merge 9 letters into code. Then, each time we meet a new letter, we merge it to code by | mapping(s[i]) and mask the leftmost letter by & 0xfffff (20 bits take 5 hexadecimal digits). Thus we have a code for the current 10-letter substring. We check whether it has occurred exactly for once to decide whether to push it to the result or not.

The above code can still be shorten using tricks from the above link. In fact, if we code A, T, C, G using 3 bits, the code will be as short as 10 lines! Refer to the above link to learn more!

目录
相关文章
|
5月前
|
算法 vr&ar 图形学
☆打卡算法☆LeetCode 187. 重复的DNA序列 算法解析
☆打卡算法☆LeetCode 187. 重复的DNA序列 算法解析
|
5月前
leetcode-187:重复的DNA序列
leetcode-187:重复的DNA序列
46 0
|
算法 C++
​LeetCode刷题实战187:重复的DNA序列
算法的重要性,我就不多说了吧,想去大厂,就必须要经过基础知识和业务逻辑面试+算法面试。所以,为了提高大家的算法能力,这个公众号后续每天带大家做一道算法题,题目就从LeetCode上面选 !
132 0
|
存储
LeetCode 187. Repeated DNA Sequences
所有 DNA 由一系列缩写为 A,C,G 和 T 的核苷酸组成,例如:“ACGAATTCCG”。在研究 DNA 时,识别 DNA 中的重复序列有时会对研究非常有帮助。 编写一个函数来查找 DNA 分子中所有出现超多一次的10个字母长的序列(子串)。
78 0
LeetCode 187. Repeated DNA Sequences
[LeetCode] Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the
834 0
|
5天前
|
Unix Shell Linux
LeetCode刷题 Shell编程四则 | 194. 转置文件 192. 统计词频 193. 有效电话号码 195. 第十行
本文提供了几个Linux shell脚本编程问题的解决方案,包括转置文件内容、统计词频、验证有效电话号码和提取文件的第十行,每个问题都给出了至少一种实现方法。
LeetCode刷题 Shell编程四则 | 194. 转置文件 192. 统计词频 193. 有效电话号码 195. 第十行
|
2月前
|
Python
【Leetcode刷题Python】剑指 Offer 32 - III. 从上到下打印二叉树 III
本文介绍了两种Python实现方法,用于按照之字形顺序打印二叉树的层次遍历结果,实现了在奇数层正序、偶数层反序打印节点的功能。
44 6
|
2月前
|
搜索推荐 索引 Python
【Leetcode刷题Python】牛客. 数组中未出现的最小正整数
本文介绍了牛客网题目"数组中未出现的最小正整数"的解法,提供了一种满足O(n)时间复杂度和O(1)空间复杂度要求的原地排序算法,并给出了Python实现代码。
82 2
|
5天前
|
数据采集 负载均衡 安全
LeetCode刷题 多线程编程九则 | 1188. 设计有限阻塞队列 1242. 多线程网页爬虫 1279. 红绿灯路口
本文提供了多个多线程编程问题的解决方案,包括设计有限阻塞队列、多线程网页爬虫、红绿灯路口等,每个问题都给出了至少一种实现方法,涵盖了互斥锁、条件变量、信号量等线程同步机制的使用。
LeetCode刷题 多线程编程九则 | 1188. 设计有限阻塞队列 1242. 多线程网页爬虫 1279. 红绿灯路口
|
2月前
|
索引 Python
【Leetcode刷题Python】从列表list中创建一颗二叉树
本文介绍了如何使用Python递归函数从列表中创建二叉树,其中每个节点的左右子节点索引分别是当前节点索引的2倍加1和2倍加2。
38 7