1. Problem Definition:
a) Input: Strings X = x1 ... xm, Y = y1 ... yn over some alphabet Sigma (like {A,C,G,T})
b) Penalty Pgap for inserting a gap, Pab for matching a & b [presumably Pab = 0 of a = b]
c) Feasible solutions: Alignments - i.e., insert gaps to equalize lengths of the string
d) Goal: Alignment with minimum possible total penalty
e) Example :
2. A Dynamic Programming Approach
-- Key step: Identify subproblems. As usual, will look at structure of an optimal solution for clues.
[i.e., develop a recurrence + then reverse engineer the subproblems]
-- Structure of optimal solution: Consider an optimal alignment of X , Y and its final position:
-- Three cases :
a) Case 1: xm , yn matched
b) Case 2: xm matched with a gap
c) Case 3: yn matched with a gap
[Pointless to have 2 gaps]
-- Optimal substructure: Let X' = X - xm, Y' = Y - yn.
a) If case (1) holds, then induced alignment of X' & Y' is optimal.
b) If case (2) holds, then induced alignment of X' & Y is optimal.
c) If case (3) holds, then induced alignment of X & Y' is optimal.
3. Optimal Substructure Proof
-- Proof of Case 1, other cases are similar :
By contradiction. Suppose induced alignment of X' , Y' has penalty P while some other one has penalty P* < P.
==> Appending xm , yn to the latter, get an alignment of X and Y with penalty P + Pxmyn < P + Pxmyn ==> Contradicts optimality of original alignment of X & Y .
4. The Recurrence
-- Notation: Let Xi = 1st i letters of X , Yj = 1st j letters of Y. Pi,j = penalty of optimal alignment of Xi & Yj
-- Recurrence: For all i = 1, ... , m and j = 1, ... , n:
Pij = min{ (1) Pxi yj + Pi-1,j-1 , (2) Pgap + Pi-1,j (3) Pgap + Pi,j-1 }
-- Base case : Pi,0 = P0,i = i * Pgap
5. The Algorithm
-- A = 2-D array.
A[i , 0] = A[0 , i] = i * Pgap for any i >= 0
For i = 1 to m
For j = 1 to n
A[i , j ] = min { A[i-1, j-1] + Pxiyi , A[i-1, j] + Pgap , A[i, j-1] + Pgap }
-- Running Time : O(mn)
6. Reconstructing a Solution
-- Trace back through filled-in table A, starting at A[m , n]
-- When you reach subproblem A[i , j ]:
-- If A[i , j ] filled using case (1), match xi & yj and go to A[i - 1 , j - 1]
-- If A[i , j ] filled using case (2), match xi with a gap and go to A[i - 1 , j ]
-- If A[i , j ] filled using case (3), match yj with a gap and go to A[i , j - 1]
[If i = 0 or j = 0, match remaining substring with gaps]
-- Running time is only O(m + n)
相关推荐
总结来说,"DNA Sequence Alignment Using Dynamic Programming by C#"涉及到将生物学的DNA序列比对问题与计算机科学的动态规划方法相结合,利用C#语言进行编程实现。这个过程涵盖了全局和局部比对算法、动态规划...
全局子序列比对 Global Sequence Alignment
多重序列比对(Multiple Sequence Alignment, MSA)是生物信息学中的一个重要概念,它涉及到对多个生物序列(如DNA、RNA或蛋白质序列)进行排列,以便揭示它们之间的同源性和进化关系。在生物进化研究、基因功能预测...
在生物信息学领域,蛋白质序列的多重比对(Multiple Sequence Alignment,MSA)是一项基础而关键的技术,它能够揭示大量蛋白质序列间的相似性和差异性,从而帮助科学家们分析保守基序、进行系统发育研究、预测结构...
Multiple Sequence Alignment Using Maximum Spanning Tree 生物信息学中一个重要topic:多序列联配。这里用最大生成树的思想来进行联配,算法为C语言实现。本程序在Dev-C++,Win Vista环境下实现。
Bowtie是一款在生物信息学领域广泛应用的Next Generation Sequence (NGS) 对齐软件,专门设计用于在Windows操作系统上高效地将高通量测序数据与基因组进行比对。NGS技术使得科学家们能够快速获取大量DNA序列信息,但...
一种通过点图比较两个蛋白质序列并显示相似性的工具。
Swift是一个DNA序列比对程序,使用Smith-Waterman算法产生缺口比对。 它以查询文件(FASTA格式)和参考文件(FASTA格式)为输入。 它输出参考名称,读取名称,空位比对,比对得分,比对开始和结束位置以及比对...
**Zoomable Sequence Alignment Editor(ZAE)开源项目详解** ZAE,全称为Zoomable Alignment Editor,是一款专为生物信息学领域设计的可缩放多序列比对编辑器。这款工具的核心功能在于它允许用户在不同尺度下查看...
### FPGASW: 加速大规模Smith-Waterman序列比对应用及回溯技术在FPGA线性阵列上的实现 #### 摘要与引言 本文介绍了一种利用现场可编程门阵列(FPGA)进行加速的Smith-Waterman (SW)序列比对算法——FPGASW。...
在生物信息学领域,DNA序列对齐是一项至关重要的任务,它涉及到比较和匹配不同生物样本的DNA序列,以便找出它们之间的相似性和差异性。在这个实验项目中,汪浩通过杨宁老师的指导,采用两种经典的算法——动态规划和...
The chapters discuss pairwise and multiple sequence alignment, fast database searches for homologous sequences, protein motif identification, genome rearrangement, physical mapping, phylogeny ...
You no longer need one program for restriction analyses and others for multiple sequence alignment, designing PCR primers, protein sequence analysis or drawing plasmids... DNAMAN carries out all ...
MolQuest is the most ... multiple sequence alignment, phylogenetic reconstruction, and a wide variety of other functions, for instance: 生物信息学工具软件包MolQuest trial,包括了107种常用的分析工具。
本文讨论了如何在CPU-GPU异构平台上并行化Smith-Waterman算法,以加速成对序列比对的应用。首先,我们需要了解Smith-Waterman算法是什么。Smith-Waterman算法是一种基于动态规划的算法,用于比较两个序列并找出它们...
MSAProbs是一种开放源代码的蛋白质多序列验证算法,与ClustalW,MAFFT,MUSCLE,ProbCons和Probalign相比,在流行的基准:BALIBASE,PREFAB,SABMARK,OXBENCH上达到了统计上最高的比对准确性。
FastMSA(Fast Multiple Sequence Alignment)是一个基于Python的高效多序列比对库,主要用于生物信息学领域。在版本0.3.2中,它提供了一种快速且可靠的解决方案,用于处理大规模的核酸或蛋白质序列比对任务。这个库...
在生物信息学领域,序列比对(Sequence Alignment)是一项核心任务,它涉及到寻找两个或多个生物序列之间的相似性。BLAST(Basic Local Alignment Search Tool)是用于快速比对核酸和蛋白质序列的一种流行方法,尤其...