All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
此题思路是容易想到的,遍历输入字符串的每个长度为10的substring,利用HashMap 检查其出现次数,出现两次或者以上的则加入到结果中。
实现时仅当某个substring第二次出现时加入结果可避免结果中出现重复字符串。但直接实现会得到Memory Limit Exceed,就是程序内存开销太大了。
参考解答中的掩码技巧值得学习,使用一个20位的数字0x3ffff称为eraser,每次要更新一位字符时,将老的编码hint & eraser, 然后左移两位,然后加上新字符对应的编码,
public class Solution { // Method 2: hashmap store int instead of string to bypass MLE public static final int eraser = 0x3ffff; public static HashMap<Character, Integer> ati = new HashMap<Character, Integer>(); static { ati.put('A', 0); ati.put('C', 1); ati.put('G', 2); ati.put('T', 3); } public List<String> findRepeatedDnaSequences(String s) { List<String> result = new ArrayList<String>(); if (s == null || s.length() <= 10) return result; int N = s.length(); int hint = 0; for (int i = 0; i < 10; i++) { hint = (hint << 2) + ati.get(s.charAt(i)); } HashMap<Integer, Integer> checker = new HashMap<Integer, Integer>(); checker.put(hint, 1); for (int i = 10; i < N; i++) { hint = ((hint & eraser) << 2) + ati.get(s.charAt(i)); Integer value = checker.get(hint); if (value == null) { checker.put(hint, 1); } else if (value == 1) { checker.put(hint, value + 1); result.add(s.substring(i - 9, i + 1)); } } return result; } // Method 1: Memory Limit Exceed & may contain duplicates public List<String> findRepeatedDnaSequences1(String s) { HashMap<String, Integer> map = new HashMap<String, Integer>(); int last = s.length() - 10; for (int i = 0; i <= last; i++) { String key = s.substring(i, i + 10); if (map.containsKey(key)) { map.put(key, map.get(key) + 1); } else { map.put(key, 1); } } List<String> result = new ArrayList<String>(); for (String key : map.keySet()) { if (map.get(key) > 1) result.add(key); } return result; } }
Leetcode - LRU Cache
Leetcode - Max Points on a Line
Leetcode - Fraction to Recurring Decimal
Leetcode - Isomorphic Strings
Leetcode - Palindrome Permutation
Leetcode - Group Shifted String
Leetcode - Two Sum III - Data Structure Design
Leetcode - Longest Consecutive Sequence
Leetcode - Contains Duplicate II
Leetcode - Shortest Word Distance II
Leetcode - Single Number III
Leetcode - Bitwise AND of Number Range
Leetcode - Power of Two
Leetcode - Single Num II
Leetcode - Pow(x, n)
Leetcode - Divide Two Integers
Leetcode - Two Sum
MockInterview-Implement Dictionary
