1. Problem Definition:
a) Input: Strings X = x1 ... xm, Y = y1 ... yn over some alphabet Sigma (like {A,C,G,T})
b) Penalty Pgap for inserting a gap, Pab for matching a & b [presumably Pab = 0 of a = b]
c) Feasible solutions: Alignments - i.e., insert gaps to equalize lengths of the string
d) Goal: Alignment with minimum possible total penalty
e) Example :
2. A Dynamic Programming Approach
-- Key step: Identify subproblems. As usual, will look at structure of an optimal solution for clues.
[i.e., develop a recurrence + then reverse engineer the subproblems]
-- Structure of optimal solution: Consider an optimal alignment of X , Y and its final position:
-- Three cases :
a) Case 1: xm , yn matched
b) Case 2: xm matched with a gap
c) Case 3: yn matched with a gap
[Pointless to have 2 gaps]
-- Optimal substructure: Let X' = X - xm, Y' = Y - yn.
a) If case (1) holds, then induced alignment of X' & Y' is optimal.
b) If case (2) holds, then induced alignment of X' & Y is optimal.
c) If case (3) holds, then induced alignment of X & Y' is optimal.
3. Optimal Substructure Proof
-- Proof of Case 1, other cases are similar :
By contradiction. Suppose induced alignment of X' , Y' has penalty P while some other one has penalty P* < P.
==> Appending xm , yn to the latter, get an alignment of X and Y with penalty P + Pxmyn < P + Pxmyn ==> Contradicts optimality of original alignment of X & Y .
4. The Recurrence
-- Notation: Let Xi = 1st i letters of X , Yj = 1st j letters of Y. Pi,j = penalty of optimal alignment of Xi & Yj
-- Recurrence: For all i = 1, ... , m and j = 1, ... , n:
Pij = min{ (1) Pxi yj + Pi-1,j-1 , (2) Pgap + Pi-1,j (3) Pgap + Pi,j-1 }
-- Base case : Pi,0 = P0,i = i * Pgap
5. The Algorithm
-- A = 2-D array.
A[i , 0] = A[0 , i] = i * Pgap for any i >= 0
For i = 1 to m
For j = 1 to n
A[i , j ] = min { A[i-1, j-1] + Pxiyi , A[i-1, j] + Pgap , A[i, j-1] + Pgap }
-- Running Time : O(mn)
6. Reconstructing a Solution
-- Trace back through filled-in table A, starting at A[m , n]
-- When you reach subproblem A[i , j ]:
-- If A[i , j ] filled using case (1), match xi & yj and go to A[i - 1 , j - 1]
-- If A[i , j ] filled using case (2), match xi with a gap and go to A[i - 1 , j ]
-- If A[i , j ] filled using case (3), match yj with a gap and go to A[i , j - 1]
[If i = 0 or j = 0, match remaining substring with gaps]
-- Running time is only O(m + n)
相关推荐
DNA Sequence Alignment Using Dynamic Programming by C#
全局子序列比对 Global Sequence Alignment
给定两篇2万字以上的论文(.txt), 用前述的Sequence Alignment算法计算其Edit Distance.利用Sequence Alignment算法计算两篇文章的Edit Distance
用于下一代测序的sequence alignment 软件,在windows上运行
Kalign – an accurate and fast multiple sequence alignment algorithm
生物信息学, 多重序列比对程序. 本资源包含源代码及基本序列数据.基于跌代方法得到MSA
Multiple Sequence Alignment Using Maximum Spanning Tree 生物信息学中一个重要topic:多序列联配。这里用最大生成树的思想来进行联配,算法为C语言实现。本程序在Dev-C++,Win Vista环境下实现。
FPGASW: Accelerating Large‑Scale Smith–Waterman Sequence Alignment Application with Backtracking on FPGA Linear Systolic Array
一种通过点图比较两个蛋白质序列并显示相似性的工具。
Swift是一个DNA序列比对程序,使用Smith-Waterman算法产生缺口比对。 它以查询文件(FASTA格式)和参考文件(FASTA格式)为输入。 它输出参考名称,读取名称,空位比对,比对得分,比对开始和结束位置以及比对...
Gene sequence alignment, used to recognize the homology and variability in different species, is an important part of Bioinformatics. Creating indexes is a crucial step of gene sequence alignment ...
Parallel Smith-Waterman Algorithm for Pairwise Sequence Alignment on CPU-GPU heterogeneous platform
动态规划算法实现:花费时间以及动态规划结果,得到相应的对齐序列。 分治算法:由算法得到花费时间
ZAE是可缩放的多序列比对编辑器
The chapters discuss pairwise and multiple sequence alignment, fast database searches for homologous sequences, protein motif identification, genome rearrangement, physical mapping, phylogeny ...
You no longer need one program for restriction analyses and others for multiple sequence alignment, designing PCR primers, protein sequence analysis or drawing plasmids... DNAMAN carries out all ...
MolQuest is the most ... multiple sequence alignment, phylogenetic reconstruction, and a wide variety of other functions, for instance: 生物信息学工具软件包MolQuest trial,包括了107种常用的分析工具。
MSAProbs是一种开放源代码的蛋白质多序列验证算法,与ClustalW,MAFFT,MUSCLE,ProbCons和Probalign相比,在流行的基准:BALIBASE,PREFAB,SABMARK,OXBENCH上达到了统计上最高的比对准确性。
Performing a sequence alignment using BLAST method.
目录 从 BAM 文件创建 FASTQ 文件BAM文件的随机子采样计算读取次数获取基因组序列比较 BAM 文件转换引用名称覆盖范围随着时间的推移观星者由gh-md-toc创建2021 年 7 月 21 日星期三 11:44:40 日本标准时间 学习 BAM ...