当前位置:首页 >> 工学 >>

算法导论15-5编辑距离解答(含程序代码)


1515-5 编辑距离

a)Given a)Given two sequences x[1 ‥ m] and y[1 ‥ n] and set of transformationtransformation-operation costs, the edit distance from x to y is the cost of the least expensive operation sequence dynamicthat transforms x to y. Describe a dynamic-programming algorithm that finds the edit distance from x[1 ‥ m] to y[1 ‥ n] and prints an optimal operation sequence. Analyze the running time and space requirements of your algorithm.
First,,set X[i]=x[1..i] 、 Y[j]=y[1..j].The edit distance‘s child problem is actually from X [i] to Y [j] optimal conversion, We will call it "X [i] -> Y [i] problem " 。 Obviously,The original problem is actually X [m] -> Y [n].Set c[i, j] Said that resolution of the X [i] -> Y [j] problem of the optimal cost,Then under settlement X[i]->Y[j] On the last step of the operation,The following recurrence relations can be: If the last step of the operation is COPY,Then there must be x [i] = y [j].The optimal solution to this problem depends on the X [i-1] -> Y [j-1] the optimal solution, so there is c [i, j] = c [i-1, j-1] + cost (COPY); If the last step is to REPLACE, then there must be x [i]! = y [i]. (This assumes that the implementation can not use the same letters replace operation.) optimal solution to this problem also depends on x [i-1] -> Y [j-1] the optimal solution, so there is c [i, j] = c [i-1, j-1] + cost (REPLACE); If the last step is TWIDDLE, then there must be x [i] = y [j-1] and x [i-1] = y [j], and asked i, j> = 2. At this time, the optimal solution of this problem depends on the X [i-2] -> Y [i-2] the optimal solution, so there is c [i, j] = c [i-2, j-2] + cost (TWIDDLE);

If the last step of the operation is DELETE, the operation on the x or y is not restricted. Optimal solution to this problem

depends on the X [i-1] -> Y [j] the optimal solution, so there is c [i, j] = c [i-1, j] + cost (DELETE); If the last step of the operation is INSERT, as there is no limit on the x and y. Optimal solution to this problem depends on the X [i] -> Y [j-1] the optimal solution, so there is c [i, j] = c [i, j-1] + cost (INSERT); If the last step is to KILL, it means that we have completed the X [m] to Y [n] of the conversion, so the current problem must be X [m] -> Y [n] issue. That is, at this time there must be i = m and j = n. Optimal solution to this problem depends on the X [k] -> Y [n] the optimal solution (0 <= k <m), so there is c [m, n] = min (0 <= k <m) {c [k, n] + cost (KILL)}. X [0] and Y [0] represents the empty string, from X [0] switch to Y [j] INSERT operations require j times, so c [0, j] = j * cost (INSERT). Similarly, from X [i] to switch to Y [0] need to DELETE operation i times, then c [i, 0] = i * cost (DELETE). When i = j = 0 时, c [i, j] = c [0, 0] = 0, empty string into the cost of an empty string to 0. For i, j> 0, c [i, j] has the following recurrence formula:
c[i, j]=min{ c[i-1, j-1]+cost(COPY) (x[i]=y[j]), c[i-1, j-1]+cost(REPLACE) (x[i]!=y[j]), c[i-2, j-2]+cost(TWIDDLE) (i, j>=2 and x[i]=y[j-1] and x[i-1]=y[j]), c[i-1, j]+cost(DELETE), c[i, j-1]+cost(INSERT), min(0<=k<m){c[k, n]+cost(KILL)} (i=m and j=n). }

In addition, the set op [i, j] Save to solve X [i] -> Y [j] the issue of the last step. EDIT-DISTANCE(x, y, m, n) for i<-0 to m do c[i, 0]<-i*cost(DELETE) op[i, 0]<-DELETE for j<-0 to n do c[0, j]<-j*cost(INSERT) op[0, j]<-INSERT for i<-1 to m do for j<-1 to n

do c[i, j]<-∞ if x[i]=y[j] then c[i, j]<-c[i-1, j-1]+cost(COPY) op[i, j]<-COPY if x[i]!=y[j] and c[i-1, j-1]+cost(REPLACE)<c[i, j] then c[i, j]<-c[i-1, j-1]+cost(REPLACE) op[i, j]<-REPLACE(y[j]) if i>=2 and j>=2 and x[i]=y[j-1] and x[i-1]=y[j] and c[i-2, j-2]+cost(TWIDDLE)<c[i, j] then c[i, j]<-c[i-2, j-2]+cost(TWIDDLE) op[i, j]<-TWIDDLE if c[i-1, j]+cost(DELETE)<c[i, j] then c[i, j]<-c[i-1, j]+cost(DELETE) op[i, j]<-DELETE if c[i, j-1]+cost(INSERT)<c[i, j] then c[i, j]<-c[i, j-1]+cost(INSERT) op[i, j]<-INSERT(y[j]) for i<-0 to m-1 do if c[i, n]+cost(KILL)<c[m, n] then c[m, n]<-c[i, n]+cost(KILL) op[m, n]<-KILL(i) return c and op

Obviously, this algorithm time and space complexity are Θ (mn). We use the EDIT-DISTANCE return the table to reconstruct the sequence of operations op. OP-SEQUENCE (op, i, j) Reconstruction of X [i] -> Y [j] the optimal solution, the boundary conditions are i = j = 0. The outermost call OP-SEQUENCE (op, m, n)
OP-SEQUENCE(op, i, j) if i=0 and j=0 then return if op[i, j]=COPY or op[i, j]=REPLACE then OP-SEQUENCE(op, i-1, j-1) else if op[i, j]=TWIDDLE then OP-SEQUENCE(op, i-2, j-2) else if op[i, j]=DELETE then OP-SEQUENCE(op, i-1, j) else if op[i, j]=INSERT then OP-SEQUENCE(op, i, j-1) else if op[i, j]=KILL(k) then OP-SEQUENCE(op, k, j) print op[i, j]

The edit-distance problem is a generalization of the problem of aligning two DNA sequences (see, for example, Setubal and Meidanis [272, Section 3.2]). There are several methods for measuring the similarity of two DNA sequences by aligning them. One such method to align two sequences x and y consists of inserting spaces at arbitrary locations in the two sequences (including at either end) so that the resulting sequences x′ and y′ have the same length but do not have a space in the same position (i.e., for no position j are both x′[j] and y′[j] a space.) Then we assign a "score" to each position. Position j receives a score as follows: +1 if x′[j] = y′[j] and neither is a space, -1 if x′[j] ≠ y′[j] and neither is a space, -2 if either x′[j] or y′[j] is a space. The score for the alignment is the sum of the scores of the individual positions. For example, given the sequences x = GATCGGCAT and y = CAATGTGAATC, one alignment is G ATCG GCAT CAAT GTGAATC -*++*+*+-++* A + under a position indicates a score of +1 for that position, a - indicates a score of -1, and a * indicates a score of -2, so that this alignment has a total score of 6 · 1 - 2 · 1 4 · 2 _ -4.

b) Explain how to cast the problem of finding an optimal alignment as an edit distance problem using a subset of the transformation operations copy, replace, delete, insert, twiddle, and kill.
DNA sequence edit distance problem is essentially the problem: cost(COPY) = -1, cost(REPLACE) = +1, cost(DELETE) cost(INSERT) = +2, = +2,

TWIDDLE operation and KILL operation is not allowed.

DNA sequence alignment of the two biggest points to the operation of the A string is converted to B by the string edit distance of the opposite number.


相关文章:
算法导论15-5编辑距离解答(含程序代码).doc
算法导论15-5编辑距离解答(含程序代码) - 1515-5 编辑距离 a)Gi
编辑距离问题-算法导论.doc
{Case1, Case2, Case3, Case4} 二、主程序代码 #include <iostream> #...算法导论编辑距离讲解 暂无评价 1页 1下载券 算法导论15-5编辑距离解... ...
算法导论编辑距离讲解.doc
算法导论编辑距离讲解_计算机软件及应用_IT/计算机_专业资料。Ttitle:动态规划求最短编辑距离问题描述:已知字符串 X 和 Y,要求使用最少的字符串操 作将字符串 X...
算法导论实验报告:编辑距离与最佳基因串匹配.pdf
} 详细代码见附件源文件 GlobalSequenceAlignment.cpp 【程序运行结果】 Test my...算法导论15-5编辑距离解... 5页 免费 算法设计动态规划(编辑距... 7页 ...
算法作业讲解_图文.ppt
算法作业讲解_研究生入学考试_高等教育_教育专区。算法导论习题讲解 ...15-5:编辑距离问题。(题目太长,略) 解答要点: (a)递归求解公式为: if x[...
算法导论.doc
算法导论 - 算法 1 算法导论 15-3 动态规划求最短编辑距离 (2010-
算法导论上机报告.doc
算法导论上机报告_计算机软件及应用_IT/计算机_专业资料。编辑距离编辑距离(Edit ...代码部分: #include <assert.h> #include <stdio.h> #include <stdlib.h> ...
算法设计动态规划(编辑距离).doc
编辑距离问题 动态规划 编辑距离 问题 ...5 */ 、复杂度分析 以下为核心代码部分: for(...等著潘金贵等译.算法导论(第二版).机械工业出版社,...
算法导论课程设计_图文.doc
2006. 12 附件程序代码: package 编辑距离; import...x, y), z); } return C[n][m]; } } 15...算法导论课程设计_齐海 暂无评价 10页 5下载券 ...
算法导论.doc
高端算法, 联想到本科学 过的 C 语言、程序设计等...动态规划的基本思想和字 符串的编辑距离等例子,还有...15页 1下载券 算法导论答案 70页 免费 ...
算法导论.doc
编辑距离】两个字符串 s1,s2 的编辑距离是指把...设计一个拼写纠错的程序】 【思路】 字典以字母键...贪心算法导论(完整版) 9页 5下载券 算法导论作业...
《算法导论》读书笔记之第2章 算法入门.pdf
算法导论》读书笔记之第 13章 红黑树(9) 5. ...程序执行结果如下: 方法2:网上课后习题答案上面给的...(6) 编辑 收藏 评论列表 #1楼 2013-01-26 15:...
更多相关标签: