`
leonzhx
  • 浏览: 796653 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论
阅读更多

1.  Brute-force substring search

    --  Check for pattern starting at each text position.

    --  Java Implementation

 

public static int search(String pat, String txt)
{
    int M = pat.length();
    int N = txt.length();
    for (int i = 0; i <= N - M; i++)
    {
        int j;
        for (j = 0; j < M; j++)
            if (txt.charAt(i+j) != pat.charAt(j))
                break;
        if (j == M) return i;
    }
    return N;
}

 

    --  Brute-force algorithm can be slow if text and pattern are repetitive: ~ M N char compares.

 



    --  Same sequence of char compares as previous implementation.

        -  i points to end of sequence of already-matched chars in text.

        -  j stores # of already-matched chars (end of sequence in pattern).

        -  Java Implementation

 

public static int search(String pat, String txt)
{
    int i, N = txt.length();
    int j, M = pat.length();
    for (i = 0, j = 0; i < N && j < M; i++)
    {
        if (txt.charAt(i) == pat.charAt(j)) j++;
        else { i -= j; j = 0; } // explicit backup, we can add if(i >= N-M) break;
    }
    if (j == M) return i - M;
    else return N;
}

    --  In many applications, we want to avoid backup in text stream. Brute-force algorithm needs backup for every mismatch.

 

 

2.  Knuth-Morris-Pratt substring search

    --  DFA is abstract string-searching machine.

        -  Finite number of states (including start and halt).

        -  Exactly one transition for each char in alphabet.

        -  Accept if sequence of transitions leads to halt state.


    --  State = number of characters in pattern that have been matched = length of longest prefix of pat[] that is a suffix of txt[0..i]

    --  Java Implementation:

 

public int search(String txt)
{
    int i, j, N = txt.length();
    for (i = 0, j = 0; i < N && j < M; i++)
        j = dfa[txt.charAt(i)][j];
    if (j == M) return i - M;
    else return N;
}

    --  Key differences from brute-force implementation.

 

        -  Need to precompute dfa[][] from pattern.

        -  Text pointer i never decrements. 

    --  Build DFA from pattern:

        -  Match transition: If in state j and next char c == pat.charAt(j), go to j+1.

        -  Mismatch transition: If in state j and next char c != pat.charAt(j), then the last j-1 characters of input are pat[1..j-1], followed by c. To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c.

        -  Takes only constant time if we maintain state X which is the state resulting from simulating pat[1...j-1] on DFA.


        -  Algorithm:

            For each state j:

                a)  Copy dfa[][X] to dfa[][j] for mismatch case.

                b)  Set dfa[pat.charAt(j)][j] to j+1 for match case.

                c)  Update X.

        -  Java Implementation:

 

public KMP(String pat)
{
    this.pat = pat;
    M = pat.length();
    dfa = new int[R][M];
    dfa[pat.charAt(0)][0] = 1;
    for (int X = 0, j = 1; j < M; j++)
    {
        for (int c = 0; c < R; c++)
            dfa[c][j] = dfa[c][X];
        dfa[pat.charAt(j)][j] = j+1;
        X = dfa[pat.charAt(j)][X];
    }
}

         -  Running time: M character accesses (but space and time proportional to R M).

 

 

3.  Boyer-Moore Substring Search

    --  Scan characters in pattern from right to left.

    --  Can skip as many as M text chars when finding one not in the pattern.


    --  How much to skip:

        -  Mismatch character not in pattern : increment i one character beyond the mismatched character

        -  Mismatch character in pattern: align rightmost character in the pattern with the mismatched character without backup

        -  Mismatch character in pattern: increment i by 1. ( if alignment need backup)

    --  Precompute index of rightmost occurrence of character c in pattern (-1 if character not in pattern).

 

right = new int[R];
for (int c = 0; c < R; c++)
    right[c] = -1;
for (int j = 0; j < M; j++)
    right[pat.charAt(j)] = j;

 
     --  Java Implementation:

 

 

public int search(String txt)
{
    int N = txt.length();
    int M = pat.length();
    int skip;
    for (int i = 0; i <= N-M; i += skip)
    {
        skip = 0;
        for (int j = M-1; j >= 0; j--)
        {
            if (pat.charAt(j) != txt.charAt(i+j))
            {
                skip = Math.max(1, j - right[txt.charAt(i+j)]);
                break;
            }
        }
        if (skip == 0) return i;
    }
    return N;
}

    --  Substring search with the Boyer-Moore mismatched character heuristic takes about ~ N / M character compares to search for a pattern of length M in a text of length N.

 

    --  Can be as bad as ~ M N.

 

4.  Rabin-Karp fingerprint search

    --  Algorithm

        --  Compute a hash of pattern characters 0 to M - 1.

        --  For each i, compute a hash of text characters i to M + i - 1.

        --  If pattern hash = text substring hash, check for a match.

    --  Modular hash function. Using the notation ti for txt.charAt(i), compute :

        xi = ti R^M-1 + ti+1 R^M-2 + … + ti+M-1 R^0 (mod Q)

    --  Horner's method. Linear-time method to evaluate degree-M polynomial:

// Compute hash for M-digit key
private long hash(String key, int M)
{
    long h = 0;
    for (int j = 0; j < M; j++)
        h = (R * h + key.charAt(j)) % Q;
    return h;
}

 
    --  How to efficiently compute xi+1 given that we know xi:

        xi+1 = ( xi – t i R^M–1 ) R + t i +M



    --  Java Implementation

public class RabinKarp
{
    private long patHash; // pattern hash value
    private int M; // pattern length
    private long Q; // modulus
    private int R; // radix
    private long RM; // R^(M-1) % Q

    public RabinKarp(String pat) {
        M = pat.length();
        R = 256;
        Q = longRandomPrime(); //a large prime (but avoid overflow)
        RM = 1; //precompute R^(M – 1) (mod Q)
        for (int i = 1; i <= M-1; i++)
            RM = (R * RM) % Q;
        patHash = hash(pat, M);
    }

    private long hash(String key, int M)
    { /* as before */ }
 
    public int search(String txt)
    {
        int N = txt.length();
        int txtHash = hash(txt, M);
        if (patHash == txtHash) return 0;
        for (int i = M; i < N; i++)
        {
            txtHash = (txtHash + Q - RM*txt.charAt(i-M) % Q) % Q;
            txtHash = (txtHash*R + txt.charAt(i)) % Q;
            if (patHash == txtHash) return i - M + 1;
        }
        return N; 
    }
}

    --  Two versions of Implementation:

        -  Monte Carlo version: Return match if hash match.

        -  Las Vegas version: Check for substring match if hash match; continue search if false collision.

    --  Advantages:

        -  Extends to 2d patterns.

        -  Extends to finding multiple patterns.

    --  Disadvantages.

        -  Arithmetic ops slower than char compares.

        -  Las Vegas version requires backup.

        -  Poor worst-case guarantee.

 

5.  Substring search cost summary



 

  • 大小: 39.6 KB
  • 大小: 18.3 KB
  • 大小: 29.7 KB
  • 大小: 32.2 KB
  • 大小: 18.3 KB
  • 大小: 14.8 KB
  • 大小: 11.8 KB
  • 大小: 14.7 KB
  • 大小: 26.6 KB
  • 大小: 59.9 KB
  • 大小: 56.2 KB
分享到:
评论

相关推荐

    substring-search:子串搜索算法的实现

    在`substring-search-master`这个项目中,你可能会找到类似的实现,包括单元测试和性能分析。理解并实践这个算法可以帮助你掌握字符串处理中的高级技巧,并为解决更复杂的问题打下基础。在实际应用中,还可以考虑...

    zsh-history-substring-search::tropical_fish:ZSH港口的鱼类历史记录搜索(向上箭头)

    zsh-history-substring-search 这是的历史记录搜索功能的无尘室实现,您可以在其中键入历史记录中任何命令的任何部分,然后按选定的键(例如UP和DOWN箭头)来循环进行匹配。 要求 4.3或更高版本 安装 使用软件包...

    Swift Data Structure and Algorithms [2016]

    Perform advanced searching methods using Red-Black trees, AVL trees, and Trie trees, and take a look at several substring search algorithms Get to know about the data structures used in graphs and how...

    算法第四版(algorithms),2011年新出版,算法设计力作

    5.3 Substring Search 758 5.4 Regular Expressions 788 5.5 Data Compression 810 6 Context . . . . . . . . . . . . . . . . . . . . . . . 853 Index . . . . . . . . . . . . . . . . . . . . . . . . . 933 ...

    BM算法和sunday算法相关的原始论文

    最后,"Experiments with a Very Fast Substring Search Algorithm.pdf"可能包含对某种快速搜索算法的实验分析,这可能是对BM、SUNDAY或其他算法的实证研究。 综上所述,BM算法和sunday算法是字符串搜索领域的经典...

    Swift.Data.Structure.and.Algorithms

    AVL trees, and Trie trees, and take a look at several substring search algorithms Get to know about the data structures used in graphs and how to implement graphs such as depth-first search, breadth-...

    练习使用Java基本数据类型。使用Java的String类操作字符串和子串

    【Java基本数据类型】 Java语言提供了八种基本数据类型,包括整型(byte, short, int, long)、浮点型(float, double)、字符型...同时,根据实验要求对`SubstringSearch`类的代码进行扩展,以实现(b)和(c)的功能。

    算法-数据结构【栈、队列、串】复习题.rar

    串的长度可以是固定的,也可以是可变的,常见的操作有:拼接(concatenation)、查找子串(substring search)、替换(replacement)、分割(splitting)等。在实际应用中,串的处理涉及到正则表达式、模式匹配、...

    数据结构 线性表 栈 串 数组 树 图 排序 查找

    串的常用操作包括连接(concatenation)、子串查找(substring search)、替换(replace)等。在文本处理、字符串解析等领域中,串的应用尤为常见。 4. **数组**:数组是一种简单的数据结构,它将同一类型的元素...

    strsearch:实现 Volnitsky 字符串搜索算法,参考 http

    Volnitsky 字符串搜索算法是一种高效且实用的字符串匹配方法,主要应用于文本处理、数据挖掘和信息安全等领域。在 Go 语言环境下实现这个算法,我们可以充分利用 Go 的并发特性和内存管理优势,来提高搜索效率。...

    matlab辗转相除代码-Data_Structures_and_Algorithms.github.io:诸如C,C++,Java,Pyth

    matlab投射转相除代码 数据结构和算法 欢迎学习数据结构和算法! 数据结构类型 数据结构大致分为两类: ...Substring Search | |__________ Jump Search | |__________ Interpolation Search | |_______

    hstr-rs:hstr,但带有分页

    `hstr`(History Substring Search)是一个方便的Bash命令行工具,它允许用户通过输入部分命令字符串来快速查找并执行历史命令。然而,当命令历史非常庞大时,一次性显示所有匹配项可能会导致终端屏幕信息过多,难以...

    Sql Server中Substring函数的用法实例解析

    在SQL Server中,`SUBSTRING`函数是一个非常实用的字符串操作函数,主要用于从字符串中提取指定长度的部分。这个函数在不同的数据库系统中可能有不同的名称,但在SQL Server中,它的语法结构如下: ```sql ...

    location.search实现不同网页键数据传递简单的demo

    var searchParams = location.search.substring(1); // 去掉开头的问号 var paramsArray = searchParams.split('&'); var params = {}; paramsArray.forEach(function(param) { var keyValue = param.split('='); ...

    javascript 解析url的search方法

    if(search && search.length &gt; 1){ var search = search.substring(1); var items = search.split(‘&’); for(var index = 0 ; index &lt; items.length ; index++ ){ if(! items[index]){ continue; } var kv = ...

    ABAP常用字符串操作

    在ABAP中,可以使用`FIND`或`SEARCH`语句来查找一个字符串中的子串或模式。 **`FIND`语法示例**: ```abap FIND sub_string IN SECTION [OFFSET off] [LENGTH len] OF dobj --&gt; [IN {BYTE|CHARACTER} MODE] [{...

    Go 零基础2000题 从入门到精通

    说明 ⽬录 第⼀章 序章 关于 LeetCode 什么是 Cookbook 为什么会写这个开源书 关于书的封⾯ 关于作者 关于书中的代码 ⽬标读者 ...Binary Search ...3. Longest Substring Without Repeating Characters 4. ......

    Javascript字符串常用方法详解_.docx

    substring() 方法返回从 start 到 end 之间的字符,start、end 均为非负整数。若结束参数(end)省略,则表示从 start 位置始终截取到最终。 例如: var str = 'abcdefg'; str.substring(1, 4); //"bcd" str....

Global site tag (gtag.js) - Google Analytics