1. Brute-force substring search
-- Check for pattern starting at each text position.
-- Java Implementation
public static int search(String pat, String txt) { int M = pat.length(); int N = txt.length(); for (int i = 0; i <= N - M; i++) { int j; for (j = 0; j < M; j++) if (txt.charAt(i+j) != pat.charAt(j)) break; if (j == M) return i; } return N; }
-- Brute-force algorithm can be slow if text and pattern are repetitive: ~ M N char compares.
-- Same sequence of char compares as previous implementation.
- i points to end of sequence of already-matched chars in text.
- j stores # of already-matched chars (end of sequence in pattern).
- Java Implementation
public static int search(String pat, String txt) { int i, N = txt.length(); int j, M = pat.length(); for (i = 0, j = 0; i < N && j < M; i++) { if (txt.charAt(i) == pat.charAt(j)) j++; else { i -= j; j = 0; } // explicit backup, we can add if(i >= N-M) break; } if (j == M) return i - M; else return N; }
-- In many applications, we want to avoid backup in text stream. Brute-force algorithm needs backup for every mismatch.
2. Knuth-Morris-Pratt substring search
-- DFA is abstract string-searching machine.
- Finite number of states (including start and halt).
- Exactly one transition for each char in alphabet.
- Accept if sequence of transitions leads to halt state.
-- State = number of characters in pattern that have been matched = length of longest prefix of pat[] that is a suffix of txt[0..i]
-- Java Implementation:
public int search(String txt) { int i, j, N = txt.length(); for (i = 0, j = 0; i < N && j < M; i++) j = dfa[txt.charAt(i)][j]; if (j == M) return i - M; else return N; }
-- Key differences from brute-force implementation.
- Need to precompute dfa[][] from pattern.
- Text pointer i never decrements.
-- Build DFA from pattern:
- Match transition: If in state j and next char c == pat.charAt(j), go to j+1.
- Mismatch transition: If in state j and next char c != pat.charAt(j), then the last j-1 characters of input are pat[1..j-1], followed by c. To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c.
- Takes only constant time if we maintain state X which is the state resulting from simulating pat[1...j-1] on DFA.
- Algorithm:
For each state j:
a) Copy dfa[][X] to dfa[][j] for mismatch case.
b) Set dfa[pat.charAt(j)][j] to j+1 for match case.
c) Update X.
- Java Implementation:
public KMP(String pat) { this.pat = pat; M = pat.length(); dfa = new int[R][M]; dfa[pat.charAt(0)][0] = 1; for (int X = 0, j = 1; j < M; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][X]; dfa[pat.charAt(j)][j] = j+1; X = dfa[pat.charAt(j)][X]; } }
- Running time: M character accesses (but space and time proportional to R M).
3. Boyer-Moore Substring Search
-- Scan characters in pattern from right to left.
-- Can skip as many as M text chars when finding one not in the pattern.
-- How much to skip:
- Mismatch character not in pattern : increment i one character beyond the mismatched character
- Mismatch character in pattern: align rightmost character in the pattern with the mismatched character without backup
- Mismatch character in pattern: increment i by 1. ( if alignment need backup)
-- Precompute index of rightmost occurrence of character c in pattern (-1 if character not in pattern).
right = new int[R]; for (int c = 0; c < R; c++) right[c] = -1; for (int j = 0; j < M; j++) right[pat.charAt(j)] = j;
-- Java Implementation:
public int search(String txt) { int N = txt.length(); int M = pat.length(); int skip; for (int i = 0; i <= N-M; i += skip) { skip = 0; for (int j = M-1; j >= 0; j--) { if (pat.charAt(j) != txt.charAt(i+j)) { skip = Math.max(1, j - right[txt.charAt(i+j)]); break; } } if (skip == 0) return i; } return N; }
-- Substring search with the Boyer-Moore mismatched character heuristic takes about ~ N / M character compares to search for a pattern of length M in a text of length N.
-- Can be as bad as ~ M N.
4. Rabin-Karp fingerprint search
-- Algorithm
-- Compute a hash of pattern characters 0 to M - 1.
-- For each i, compute a hash of text characters i to M + i - 1.
-- If pattern hash = text substring hash, check for a match.
-- Modular hash function. Using the notation ti for txt.charAt(i), compute :
xi = ti R^M-1 + ti+1 R^M-2 + … + ti+M-1 R^0 (mod Q)
-- Horner's method. Linear-time method to evaluate degree-M polynomial:
// Compute hash for M-digit key private long hash(String key, int M) { long h = 0; for (int j = 0; j < M; j++) h = (R * h + key.charAt(j)) % Q; return h; }
-- How to efficiently compute xi+1 given that we know xi:
xi+1 = ( xi – t i R^M–1 ) R + t i +M
-- Java Implementation
public class RabinKarp { private long patHash; // pattern hash value private int M; // pattern length private long Q; // modulus private int R; // radix private long RM; // R^(M-1) % Q public RabinKarp(String pat) { M = pat.length(); R = 256; Q = longRandomPrime(); //a large prime (but avoid overflow) RM = 1; //precompute R^(M – 1) (mod Q) for (int i = 1; i <= M-1; i++) RM = (R * RM) % Q; patHash = hash(pat, M); } private long hash(String key, int M) { /* as before */ } public int search(String txt) { int N = txt.length(); int txtHash = hash(txt, M); if (patHash == txtHash) return 0; for (int i = M; i < N; i++) { txtHash = (txtHash + Q - RM*txt.charAt(i-M) % Q) % Q; txtHash = (txtHash*R + txt.charAt(i)) % Q; if (patHash == txtHash) return i - M + 1; } return N; } }
-- Two versions of Implementation:
- Monte Carlo version: Return match if hash match.
- Las Vegas version: Check for substring match if hash match; continue search if false collision.
-- Advantages:
- Extends to 2d patterns.
- Extends to finding multiple patterns.
-- Disadvantages.
- Arithmetic ops slower than char compares.
- Las Vegas version requires backup.
- Poor worst-case guarantee.
5. Substring search cost summary
相关推荐
在`substring-search-master`这个项目中,你可能会找到类似的实现,包括单元测试和性能分析。理解并实践这个算法可以帮助你掌握字符串处理中的高级技巧,并为解决更复杂的问题打下基础。在实际应用中,还可以考虑...
zsh-history-substring-search 这是的历史记录搜索功能的无尘室实现,您可以在其中键入历史记录中任何命令的任何部分,然后按选定的键(例如UP和DOWN箭头)来循环进行匹配。 要求 4.3或更高版本 安装 使用软件包...
Perform advanced searching methods using Red-Black trees, AVL trees, and Trie trees, and take a look at several substring search algorithms Get to know about the data structures used in graphs and how...
5.3 Substring Search 758 5.4 Regular Expressions 788 5.5 Data Compression 810 6 Context . . . . . . . . . . . . . . . . . . . . . . . 853 Index . . . . . . . . . . . . . . . . . . . . . . . . . 933 ...
最后,"Experiments with a Very Fast Substring Search Algorithm.pdf"可能包含对某种快速搜索算法的实验分析,这可能是对BM、SUNDAY或其他算法的实证研究。 综上所述,BM算法和sunday算法是字符串搜索领域的经典...
AVL trees, and Trie trees, and take a look at several substring search algorithms Get to know about the data structures used in graphs and how to implement graphs such as depth-first search, breadth-...
【Java基本数据类型】 Java语言提供了八种基本数据类型,包括整型(byte, short, int, long)、浮点型(float, double)、字符型...同时,根据实验要求对`SubstringSearch`类的代码进行扩展,以实现(b)和(c)的功能。
串的长度可以是固定的,也可以是可变的,常见的操作有:拼接(concatenation)、查找子串(substring search)、替换(replacement)、分割(splitting)等。在实际应用中,串的处理涉及到正则表达式、模式匹配、...
串的常用操作包括连接(concatenation)、子串查找(substring search)、替换(replace)等。在文本处理、字符串解析等领域中,串的应用尤为常见。 4. **数组**:数组是一种简单的数据结构,它将同一类型的元素...
Volnitsky 字符串搜索算法是一种高效且实用的字符串匹配方法,主要应用于文本处理、数据挖掘和信息安全等领域。在 Go 语言环境下实现这个算法,我们可以充分利用 Go 的并发特性和内存管理优势,来提高搜索效率。...
matlab投射转相除代码 数据结构和算法 欢迎学习数据结构和算法! 数据结构类型 数据结构大致分为两类: ...Substring Search | |__________ Jump Search | |__________ Interpolation Search | |_______
`hstr`(History Substring Search)是一个方便的Bash命令行工具,它允许用户通过输入部分命令字符串来快速查找并执行历史命令。然而,当命令历史非常庞大时,一次性显示所有匹配项可能会导致终端屏幕信息过多,难以...
在SQL Server中,`SUBSTRING`函数是一个非常实用的字符串操作函数,主要用于从字符串中提取指定长度的部分。这个函数在不同的数据库系统中可能有不同的名称,但在SQL Server中,它的语法结构如下: ```sql ...
var searchParams = location.search.substring(1); // 去掉开头的问号 var paramsArray = searchParams.split('&'); var params = {}; paramsArray.forEach(function(param) { var keyValue = param.split('='); ...
if(search && search.length > 1){ var search = search.substring(1); var items = search.split(‘&’); for(var index = 0 ; index < items.length ; index++ ){ if(! items[index]){ continue; } var kv = ...
在ABAP中,可以使用`FIND`或`SEARCH`语句来查找一个字符串中的子串或模式。 **`FIND`语法示例**: ```abap FIND sub_string IN SECTION [OFFSET off] [LENGTH len] OF dobj --> [IN {BYTE|CHARACTER} MODE] [{...
说明 ⽬录 第⼀章 序章 关于 LeetCode 什么是 Cookbook 为什么会写这个开源书 关于书的封⾯ 关于作者 关于书中的代码 ⽬标读者 ...Binary Search ...3. Longest Substring Without Repeating Characters 4. ......
substring() 方法返回从 start 到 end 之间的字符,start、end 均为非负整数。若结束参数(end)省略,则表示从 start 位置始终截取到最终。 例如: var str = 'abcdefg'; str.substring(1, 4); //"bcd" str....