`
kenby
  • 浏览: 723787 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

最大公共子字符串(Longest Common Substring)

 
阅读更多

Longest Common Substring和Longest Common Subsequence是有区别的

X = <a, b, c, f, b, c>

Y = <a, b, f, c, a, b>

X和Y的Longest Common Sequence为<a, b, c, b>,长度为4

X和Y的Longest Common Substring为 <a, b>长度为2

其实Substring问题是Subsequence问题的特殊情况,也是要找两个递增的下标序列

<i1, i2, ...ik> 和 <j1, j2, ..., jk>使

xi1 == yj1

xi2 == yj2

......

xik == yjk

与Subsequence问题不同的是,Substring问题不光要求下标序列是递增的,还要求每次

递增的增量为1, 即两个下标序列为:

<i, i+1, i+2, ..., i+k-1> 和 <j, j+1, j+2, ..., j+k-1> 

 

类比Subquence问题的动态规划解法,Substring也可以用动态规划解决,令

c[i][j]表示以X[i]和Y[i]结尾的公共子串的长度,如果X[i]不等于Y[i],则c[i][j]等于0, 比如

X = <y, e, d, f>

Y = <y, e, k, f>

c[1][1] = 1

c[2][2] = 2

c[3][3] = 0

c[4][4] = 1

动态转移方程为:

如果xi == yj, 则 c[i][j] = c[i-1][j-1]+1

如果xi ! = yj,  那么c[i][j] = 0

 

最后求Longest Common Substring的长度等于

max{c[i][j], 1<=i<=n, 1<=j<=m}

 

#include <stdio.h>
#include <string.h>

//#define DEBUG

#ifdef DEBUG
#define debug(...) printf( __VA_ARGS__) 
#else
#define debug(...)
#endif

#define N 250

int 	c[N][N];

void print_str(char *s1, char *s2, int i, int j)
{
	if (s1[i] == s2[j]) {
		print_str(s1, s2, i-1, j-1);
		putchar(s1[i]);
	}
}

int common_str(char *s1, char *s2)
{
	int		i, j, n, m, max_c;
	int		x, y;

	n = strlen(s1);
	m = strlen(s2);

	max_c = -1;
	for (i = 1; i <= n; i++) {
		for (j = 1; j <= m; j++) {
			if (s1[i-1] == s2[j-1]) {
				c[i][j] = c[i-1][j-1] + 1;
			}
			else {
				c[i][j] = 0;
			}
			if (c[i][j] > max_c) {
				max_c = c[i][j];
				x = i;
				y = j;
			}
			debug("c[%d][%d] = %d\n", i, j, c[i][j]);
		}
	}

	print_str(s1, s2, x-1, y-1);
	printf("\n");
	return max_c;
}

int main()
{
	char 	s1[N], s2[N];

	while (scanf("%s%s", s1, s2) != EOF) {
		debug("%s %s\n", s1, s2);
		printf("%d\n", common_str(s1, s2));
	}
	return 0;
}
 

 

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics