最长公共子序列(Longest common subsequence,LCS),不要跟最长公共子串(Longest common substring)搞混淆了。在很多情况下,我们想知道两个串有多相似,例如:两个短句,又或者两个DNA序列(DNA Sequence),也有一个富有代表性的工具diff。
这个相似度,我们可以看作一个最长公共子序列问题,在动态规划(Dynamic Programming)里,就是求问题的最优解,很多情况下,问题的最优解不只一个,LCS只是取出其中一个。好了,LCS跟动态规划拉上关系了,因为动态规划的两个必要性质LCS都具备了。
接下看看一些定理:
最优子结构
:问题的最优解所包含的子问题的解也是最优解。
重叠子问题
:问题的最优解可以复用子问题的解。
设:
X = {x[1],x[2],...,x[m]}
Y = {y[1],y[2],...,y[n]}
然后有 X 和 Y 的一个LCS:
Z = {z[1],z[2],...,z[k]}
LCS的最优子结构:
(1) 如果 x[m] = y[n],则 z[k] = x[m] = y[n],且 Z[k - 1] 是 X[m - 1] 和 Y[n - 1] 的一个LCS。
(2) 如果 x[m] ≠ y[n],则 z[k] ≠ x[m] 蕴含 Z 是 X[m - 1] 和 Y 的一个LCS。
(3) 如果 x[m] ≠ y[n],则 z[k] ≠ y[n] 蕴含 Z 是 X 和 Y[n - 1] 的一个LCS。
由LCS的最优子结构得出一个递归式,这个递归式可以说明LCS具有重叠子问题性质:
设:
c[i,j] 为 X[i] 和 Y[j] 的一个LCS的长度
|
| (1) 0 如果 i = 0 或 j = 0
c[i,j] = < (2) c[i - 1, j - 1] + 1 如果 i,j > 0 且 x[i] = y[j]
| (3) max( c[i, j - 1], c[i - 1, j]) 如果 i,j > 0 且 x[i] ≠ y[j]
|
再看看两段简单代码:
/*
* 计算LCS的长度。
* O(mn)
*/
LCS-LENGTH(x, y)
1 m = LEN(x)
2 n = LEN(y)
3 c = [m + 1][n + 1]
4 for i = 0 to m
5 c[i,0] = 0
6 for j = 0 to n
7 c[0,j] = 0
8 for (i = 1 to m)
9 for (j = 1 to n)
10 if (x[i - 1] = y[j - 1])
11 c[i, j] = c[i - 1, j - 1] + 1
12 else
13 c[i, j] = max(c[i, j - 1], c[i - 1, j])
14
15 return c[m,n]
/*
* 计算LCS
* O(m + n)
*/
LCS(x, y, c[][])
1 m = LEN(x)
2 n = LEN(y)
3 i = m
4 j = n
5 r = [c[m,n]]
6 k = LEN(r) - 1
7 while (i > 0 && j > 0)
8 if (x[i - 1] = y[j - 1]) {
9 r[k] = x[i - 1]
10 i--; j--; k--
11 }
12 else if (c[i - 1][j] >= c[i][j - 1])
13 i--
14 else
15 j--
16
17 return r;
例子:
X = {substring}
{ s, u, b, s, t, r, i, n, g }
{x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8]}
Y = {subsequence}
{ s, u, b, s, e, q, u, e, n, c, e, }
{y[0], y[1], y[2], y[3], y[4], y[5], y[6], y[7], y[8], y[9], y[10]}
Z = {subsn}
{ s, u, b, s, n }
{z[0], z[1], z[2], z[3], z[4]}
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | Y | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| X | | | s | u | b | s | e | q | u | e | n | c | e |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 0 | | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 1 | s | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 2 | u | 0 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 3 | b | 0 | 1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 4 | s | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 5 | t | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 6 | r | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 7 | i | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 8 | n | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 5 | 5 | 5 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 9 | g | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 5 | 5 | 5 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
有发现在X和Y的头部都多了一位0么??那就是为了递归式里的(1)。
再来:
(↖ = \ = 左上角) 表示 x[m] = y[n]
(↑ = ^ = 上) 表示 c[i - 1][j] ≥ c[i][j - 1]
(← = < = 左) 表示 c[i - 1][j] < c[i][j - 1]
然后跟着$(美元符号)自底向上回溯,图像就出来啦
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | Y | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| X | | | s | u | b | s | e | q | u | e | n | c | e |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 0 | | | | | | | | | | | | | |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 1 | s | | \$| < | < | \ | < | < | < | < | < | < | < |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 2 | u | | ^ | \$| < | < | < | < | \ | < | < | < | < |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 3 | b | | ^ | ^ | \$| < | < | < | < | < | < | < | < |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 4 | s | | \ | ^ | ^ | \$| < | < | < | < | < | < | < |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 5 | t | | ^ | ^ | ^ | ^$| < | < | < | < | < | < | < |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 6 | r | | ^ | ^ | ^ | ^$| < | < | < | < | < | < | < |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 7 | i | | ^ | ^ | ^ | ^$| <$| <$| <$| <$| < | < | < |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 8 | n | | ^ | ^ | ^ | ^ | < | < | < | < | \$| < | < |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 9 | g | | ^ | ^ | ^ | ^ | < | < | < | < | ^$| <$| <$|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
再来一个综合的:
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | Y | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| X | | | s | u | b | s | e | q | u | e | n | c | e |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 0 | | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 1 | s | 0 |\1$|<1 |<1 |\1 |<1 |<1 |<1 |<1 |<1 |<1 |<1 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 2 | u | 0 |^1 |\2$|<2 |<2 |<2 |<2 |\2 |<2 |<2 |<2 |<2 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 3 | b | 0 |^1 |^2 |\3$|<3 |<3 |<3 |<3 |<3 |<3 |<3 |<3 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 4 | s | 0 |\1 |^2 |^3 |\4$|<4 |<4 |<4 |<4 |<4 |<4 |<4 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 5 | t | 0 |^1 |^2 |^3 |^4$|<4 |<4 |<4 |<4 |<4 |<4 |<4 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 6 | r | 0 |^1 |^2 |^3 |^4$|<4 |<4 |<4 |<4 |<4 |<4 |<4 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 7 | i | 0 |^1 |^2 |^3 |^4$|<4$|<4$|<4$|<4$|<4 |<4 |<4 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 8 | n | 0 |^1 |^2 |^3 |^4 |<4 |<4 |<4 |<4 |\5$|<5 |<5 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 9 | g | 0 |^1 |^2 |^3 |^4 |<4 |<4 |<4 |<4 |^5$|<5$|<5$|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
分享到:
相关推荐
这是动态规划中,求最长公共子序列(Longest common string)的源代码。自己编写执行。程序简单,有注释。
LCS(longest common substring)算法,即最大公共子串,它是求两个字符串最长公共子串的问题。大体解法是用一个矩阵来记录两个字符串中所有位置的两个字符之间的匹配情况,若是匹配则为1,否则为0。然后求出对角线最长...
【Longest Common Ancestor (LCA)问题】是图论中的一个重要概念,特别是在树结构的处理中。给定树T中的两个节点u和v,LCA指的是离根节点最远的公共祖先,即同时是u和v祖先的那个节点。解决LCA问题可以转化为求解...
最长公共子序列(Longest Common Subsequence, LCS)问题是计算机科学领域中一个重要的经典问题,它不仅被广泛应用于文本比较、生物信息学中的基因序列比对等领域,而且也是动态规划算法的一个典型应用场景。...
LeetCode Longest Common Prefix解决方案 该解决方案旨在解决LeetCode平台上的一道编程题目,即Longest Common Prefix(最长公共前缀),该问题要求在一个字符串数组中找到最长的公共前缀字符串。如果没有公共前缀...
通过一系列测试用例,如:"fosh"和"fish"、"fish"和"hish"、"lucider"和"lucifer"、"hahaui"和"hfui"以及"sasa"和"fgdfrsa",检查`longestCommonSequence`函数的输出是否符合预期。 动态规划解决问题的核心在于将...
4. 额外的测试部分:定义了一个`expect`函数来检验`longestCommonSubstring`函数的输出是否符合预期。通过传入实际的计算结果和期望的值,用toBe方法进行比较,并打印出比较结果,方便调试。 在给出的测试用例中,...
Longest Common Extension with RecompressionTomohiro I Kyushu Institute of Technology, Japantomohiro@ai.kyutech.ac.jpAbstractGiven two positions i and j in a string T of length N , a longest common ...
C#,最长公共扩展(LCE,Longest Common Extention)的算法与源代码 考虑一个字符串s,并为每对(L,R)计算从L和R开始的s的最长子字符串。 在LCE中,在每个查询中,我们必须回答从索引L和R开始的最长公共前缀的长度...
C#,最大公共子序列(LCS,Longest Common Subsequences)的算法与源代码 最长的常见子序列问题是寻找两个给定字符串中存在的最长序列。 最大公共子序列算法,常用于犯罪鉴定、亲子鉴定等等的 DNA 比对。 1.1 子...
在编程领域,寻找两个字符串之间的最长公共子字符串(Longest Common Substring,LCS)是一个经典的问题,它在文本处理、生物信息学以及许多其他应用中都有重要用途。本问题要求我们编写一个程序来解决这个问题,...
Journal of Discrete Algorithms 8 (2010) 418–428Contents lists available at ScienceDirectJournal of Discrete Algorithmswww.elsevier.com/locate/jdaThe longest common extension problem revisited and ...
### 最长公共子序列(Longest Common Subsequence, LCS) #### 定义与概念 最长公共子序列问题(LCS)是计算机科学中的一个经典问题,它涉及到在两个或多个序列中寻找最长的相同子序列。这里所说的“子序列”并不...
最长公共子字符串共有两种解决方法,下面具体说说我的思路方法一:Longest Common Substring和Longest Common Subsequence是有区别的X = <a>Y = <a>X和Y的Longest Common Sequence为,长度为4X和Y的Longest Common ...
c c语言_leetcode 0014_longest_common_prefix.zip
java入门 java_leetcode题解之014_Longest_Common_Prefix
c语言入门 c语言_leetcode题解之1143_longest_common_subsequence
js js_leetcode题解之14-longest-common-prefix.js
c语言入门 C语言_leetcode题解之14-longest-common-prefix.c
最长公共子序列(Longest Common Subsequence,LCS)是计算机科学中一种经典的字符串问题,主要涉及算法设计和分析,特别是在序列比对、文本编辑距离等领域有着广泛应用。在这个问题中,我们有两个给定的字符串,...