`
gembler
  • 浏览: 37199 次
  • 性别: Icon_minigender_1
  • 来自: 妖都
社区版块
存档分类
最新评论

Longest common subsequence

 
阅读更多

最长公共子序列(Longest common subsequence,LCS),不要跟最长公共子串(Longest common substring)搞混淆了。在很多情况下,我们想知道两个串有多相似,例如:两个短句,又或者两个DNA序列(DNA Sequence),也有一个富有代表性的工具diff。

这个相似度,我们可以看作一个最长公共子序列问题,在动态规划(Dynamic Programming)里,就是求问题的最优解,很多情况下,问题的最优解不只一个,LCS只是取出其中一个。好了,LCS跟动态规划拉上关系了,因为动态规划的两个必要性质LCS都具备了。

接下看看一些定理:


    最优子结构 :问题的最优解所包含的子问题的解也是最优解。
    重叠子问题 :问题的最优解可以复用子问题的解。


  设:
    X = {x[1],x[2],...,x[m]}
    Y = {y[1],y[2],...,y[n]}
    然后有 X 和 Y 的一个LCS:
    Z = {z[1],z[2],...,z[k]}


  LCS的最优子结构:

    (1) 如果 x[m] = y[n],则 z[k] = x[m] = y[n],且 Z[k - 1] 是 X[m - 1] 和 Y[n - 1] 的一个LCS。
    (2) 如果 x[m] ≠ y[n],则 z[k] ≠ x[m] 蕴含 Z 是 X[m - 1] 和 Y 的一个LCS。
    (3) 如果 x[m] ≠ y[n],则 z[k] ≠ y[n] 蕴含 Z 是 X 和 Y[n - 1] 的一个LCS。


  由LCS的最优子结构得出一个递归式,这个递归式可以说明LCS具有重叠子问题性质:

  设:
    c[i,j] 为 X[i] 和 Y[j] 的一个LCS的长度

               |
               | (1) 0                                     如果 i = 0 或 j = 0
    c[i,j] =  <  (2) c[i - 1, j - 1] + 1                   如果 i,j > 0 且 x[i] = y[j]
               | (3) max( c[i, j - 1], c[i - 1, j])        如果 i,j > 0 且 x[i] ≠ y[j]
               |

再看看两段简单代码:

    /*
     * 计算LCS的长度。
     * O(mn)
     */
    LCS-LENGTH(x, y)
    1    m = LEN(x)
    2    n = LEN(y)
    3    c = [m + 1][n + 1]
    4    for i = 0 to m
    5        c[i,0] = 0
    6    for j = 0 to n
    7        c[0,j] = 0
    8    for (i = 1 to m)
    9        for (j = 1 to n)
    10           if (x[i - 1] = y[j - 1])
    11               c[i, j] = c[i - 1, j - 1] + 1
    12           else
    13               c[i, j] = max(c[i, j - 1], c[i - 1, j])
    14
    15   return c[m,n]

    /*
     * 计算LCS
     * O(m + n)
     */
    LCS(x, y, c[][])
    1    m = LEN(x)
    2    n = LEN(y)
    3    i = m
    4    j = n
    5    r = [c[m,n]]
    6    k = LEN(r) - 1
    7    while (i > 0 && j > 0)
    8        if (x[i - 1] = y[j - 1]) {
    9            r[k] = x[i - 1]
    10           i--; j--; k--
    11       }
    12       else if (c[i - 1][j] >= c[i][j - 1])
    13           i--
    14       else
    15           j--
    16
    17   return r;





例子:

    X = {substring}
        { s,    u,    b,    s,    t,    r,    i,    n,    g  }
        {x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8]}

    Y = {subsequence}
        { s,    u,    b,    s,    e,    q,    u,    e,    n,    c,    e,  }
        {y[0], y[1], y[2], y[3], y[4], y[5], y[6], y[7], y[8], y[9], y[10]}

    Z = {subsn}
        { s,    u,    b,    s,    n  }
        {z[0], z[1], z[2], z[3], z[4]}

    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    |   | Y | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | X |   |   | s | u | b | s | e | q | u | e | n | c | e |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 0 |   | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 1 | s | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 2 | u | 0 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 3 | b | 0 | 1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 4 | s | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 5 | t | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 6 | r | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 7 | i | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 8 | n | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 5 | 5 | 5 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 9 | g | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 5 | 5 | 5 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+


    有发现在X和Y的头部都多了一位0么??那就是为了递归式里的(1)。

    再来:

      (↖ = \ = 左上角) 表示 x[m] = y[n]
      (↑ = ^ = 上) 表示 c[i - 1][j] ≥ c[i][j - 1]
      (← = < = 左) 表示 c[i - 1][j] < c[i][j - 1]


      然后跟着$(美元符号)自底向上回溯,图像就出来啦
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    |   | Y | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | X |   |   | s | u | b | s | e | q | u | e | n | c | e |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 0 |   |   |   |   |   |   |   |   |   |   |   |   |   |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 1 | s |   | \$| < | < | \ | < | < | < | < | < | < | < |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 2 | u |   | ^ | \$| < | < | < | < | \ | < | < | < | < |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 3 | b |   | ^ | ^ | \$| < | < | < | < | < | < | < | < |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 4 | s |   | \ | ^ | ^ | \$| < | < | < | < | < | < | < |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 5 | t |   | ^ | ^ | ^ | ^$| < | < | < | < | < | < | < |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 6 | r |   | ^ | ^ | ^ | ^$| < | < | < | < | < | < | < |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 7 | i |   | ^ | ^ | ^ | ^$| <$| <$| <$| <$| < | < | < |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 8 | n |   | ^ | ^ | ^ | ^ | < | < | < | < | \$| < | < |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 9 | g |   | ^ | ^ | ^ | ^ | < | < | < | < | ^$| <$| <$|
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+


    再来一个综合的:
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    |   | Y | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | X |   |   | s | u | b | s | e | q | u | e | n | c | e |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 0 |   | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 1 | s | 0 |\1$|<1 |<1 |\1 |<1 |<1 |<1 |<1 |<1 |<1 |<1 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 2 | u | 0 |^1 |\2$|<2 |<2 |<2 |<2 |\2 |<2 |<2 |<2 |<2 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 3 | b | 0 |^1 |^2 |\3$|<3 |<3 |<3 |<3 |<3 |<3 |<3 |<3 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 4 | s | 0 |\1 |^2 |^3 |\4$|<4 |<4 |<4 |<4 |<4 |<4 |<4 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 5 | t | 0 |^1 |^2 |^3 |^4$|<4 |<4 |<4 |<4 |<4 |<4 |<4 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 6 | r | 0 |^1 |^2 |^3 |^4$|<4 |<4 |<4 |<4 |<4 |<4 |<4 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 7 | i | 0 |^1 |^2 |^3 |^4$|<4$|<4$|<4$|<4$|<4 |<4 |<4 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 8 | n | 0 |^1 |^2 |^3 |^4 |<4 |<4 |<4 |<4 |\5$|<5 |<5 |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 9 | g | 0 |^1 |^2 |^3 |^4 |<4 |<4 |<4 |<4 |^5$|<5$|<5$|
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+

0
0
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics