python计算字符串莱文斯坦比
Posted On 2018-01-04
对str1和str2,增加、删除字符记操作数+1,替换字符记操作数+2,从str1变成str2所需的最短操作数为n,则str1和str2的相似度为\frac{n}{len(str1)+len(str2)}。
python的Levenshtein库的ratio函数实现了这个功能。自己使用python实现如下:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
def levenshtein(first,second): if len(first) > len(second): first,second = second,first #first = first[0] + first #second = second[0] + second first_length = len(first) + 1 second_length = len(second) + 1 distance_matrix = [range(second_length) for x in range(first_length)] for i in range(first_length): for j in range(second_length): if min(i,j) == 0: distance_matrix[i][j] = max(i,j) else: deletion = distance_matrix[i-1][j] + 1 insertion = distance_matrix[i][j-1] + 1 substitution = distance_matrix[i-1][j-1] if first[i-1] != second[j-1]: substitution += 2 distance_matrix[i][j] = min(insertion,deletion,substitution) return distance_matrix[first_length-1][second_length-1] def rotio(a,b): if len(a)*len(b)==0: return 0 return 1-float(levenshtein(a,b))/(len(a)+len(b)) |