Sequence Alignment

leonzhx

浏览: 773432 次
性别:
来自: 上海

最近访客更多访客>>

u012363178

justsimple

cdphantom

wang_xuewu

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2014-05 ( 22)
2014-04 ( 47)
2014-03 ( 25)
更多存档...

博客分类：

Algorithms II -- Standford 学习笔记

Sequence Alignment Editor Distance Dynamic Programming Needleman-Wunsch Score Dynamic Planning

1. Problem Definition:

a) Input: Strings X = x1 ... xm, Y = y1 ... yn over some alphabet Sigma (like {A,C,G,T})

b) Penalty Pgap for inserting a gap, Pab for matching a & b [presumably Pab = 0 of a = b]

c) Feasible solutions: Alignments - i.e., insert gaps to equalize lengths of the string

d) Goal: Alignment with minimum possible total penalty

e) Example :

2. A Dynamic Programming Approach

-- Key step: Identify subproblems. As usual, will look at structure of an optimal solution for clues.

[i.e., develop a recurrence + then reverse engineer the subproblems]

-- Structure of optimal solution: Consider an optimal alignment of X , Y and its final position:

-- Three cases :

a) Case 1: xm , yn matched

b) Case 2: xm matched with a gap

c) Case 3: yn matched with a gap

[Pointless to have 2 gaps]

-- Optimal substructure: Let X' = X - xm, Y' = Y - yn.

a) If case (1) holds, then induced alignment of X' & Y' is optimal.

b) If case (2) holds, then induced alignment of X' & Y is optimal.

c) If case (3) holds, then induced alignment of X & Y' is optimal.

3. Optimal Substructure Proof

-- Proof of Case 1, other cases are similar :

By contradiction. Suppose induced alignment of X' , Y' has penalty P while some other one has penalty P* < P.

==> Appending xm , yn to the latter, get an alignment of X and Y with penalty P + Pxmyn < P + Pxmyn ==> Contradicts optimality of original alignment of X & Y .

4. The Recurrence

-- Notation: Let Xi = 1st i letters of X , Yj = 1st j letters of Y. Pi,j = penalty of optimal alignment of Xi & Yj

-- Recurrence: For all i = 1, ... , m and j = 1, ... , n:

Pij = min{ (1) Pxi yj + Pi-1,j-1 , (2) Pgap + Pi-1,j (3) Pgap + Pi,j-1 }

-- Base case : Pi,0 = P0,i = i * Pgap

5. The Algorithm

-- A = 2-D array.

A[i , 0] = A[0 , i] = i * Pgap for any i >= 0

For i = 1 to m

For j = 1 to n

A[i , j ] = min { A[i-1, j-1] + Pxiyi , A[i-1, j] + Pgap , A[i, j-1] + Pgap }

-- Running Time : O(mn)

6. Reconstructing a Solution

-- Trace back through filled-in table A, starting at A[m , n]

-- When you reach subproblem A[i , j ]:

-- If A[i , j ] filled using case (1), match xi & yj and go to A[i - 1 , j - 1]

-- If A[i , j ] filled using case (2), match xi with a gap and go to A[i - 1 , j ]

-- If A[i , j ] filled using case (3), match yj with a gap and go to A[i , j - 1]

[If i = 0 or j = 0, match remaining substring with gaps]

-- Running time is only O(m + n)

查看图片附件

分享到：

Optimal Binary Search Trees | The Knapsack Problem

2013-10-04 22:29
浏览 999
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论