• home
  • about
  • 全ての投稿
  • ソフトウェア・ハードウェアの設定のまとめ
  • 分析関連のまとめ
  • ヘルスケア関連のまとめ
  • 生涯学習関連のまとめ

Longest common subsequence problem

date: 2021-03-26 excerpt: Longest common subsequence problem(最長共通部分列問題)について

tag: lcslongest common subsequencealgorithmdynamic programming


Longest common subsequence problem(最長共通部分列問題)について

  • diffコマンドの基幹のアルゴリズム
  • 2つの列の最長共通部分を見つけるアルゴリズム
  • 最小コスト弾性マッチング
  • レーベンシュタイン距離

For example, consider the sequences (ABCD) and (ACBAD). They have 5 length-2 common subsequences: (AB), (AC), (AD), (BD), and (CD); 2 length-3 common subsequences: (ABD) and (ACD); and no longer common subsequences. So (ABD) and (ACD) are their longest common subsequences.

具体的な挙動

  • edit distance(編集距離)に似た概念であるが、編集距離が最も小さいコストの動的計画法なら、lcsは最もコストの大きい経路を計算する
  • 内容が一致したとき、左上(i-1, j-1)のインデックスの値+1

pythonのコード

import numpy as np

def LCS2(s1, s2, n1, n2):
    """ Finds the length of the longest common subsequence of two strings
    (str, str, int, int) -> (int, 2D-array) """
    Matrix = np.zeros((n1+1 , n2+1))
    for i in range(1, n1+1):
        for j in range(1, n2+1):
            if s1[i-1] == s2[j-1]:
                Matrix[i][j] = Matrix[i-1][j-1] + 1
            if s1[i-1] != s2[j-1]:
                Matrix[i][j] = max(Matrix[i][j-1], Matrix[i-1][j])
    return (int(Matrix[n1][n2]), Matrix)

例; そのまま

問題
Longest Common Subsequence

解答

def lcs(s1, s2):
    n1, n2 = len(s1), len(s2)
    M = [[0]*(n2+1) for _ in range(n1+1)]
    for i in range(1, n1+1):
        for j in range(1, n2+1):
            if s1[i-1] == s2[j-1]:
                M[i][j] = M[i-1][j-1] + 1
            if s1[i-1] != s2[j-1]:
                M[i][j] = max(M[i][j-1], M[i-1][j])
    return (int(M[n1][n2]), M)

def solve():
    s1 = input()
    s2 = input()
    size, M = (lcs(s1, s2))
    print(size)

N=int(input())
for _ in range(N):
    solve()

参考

  • Longest common subsequence problem
  • 最長共通部分列問題

例; レーベンシュタイン距離

問題

  • 文字列変更(medium)

解答

def levenshtein_distance(s1, s2):
    n1, n2 = len(s1), len(s2)
    dp = [[float("inf")]*(n2+1) for _ in range(n1+1)]
    for i in range(n1 + 1):
        dp[i][0] = i
    for j in range(n2 + 1):
        dp[0][j] = j
    for i in range(n1):
        for j in range(n2):
            # 変更操作
            if s1[i] == s2[j]:
                dp[i + 1][j + 1] = min(dp[i + 1][j + 1], dp[i][j])
            else:
                dp[i + 1][j + 1] = min(dp[i + 1][j + 1], dp[i][j] + 1)
            # 削除操作
            dp[i + 1][j + 1] = min(dp[i + 1][j + 1], dp[i][j + 1] + 1)
            # 挿入操作
            dp[i + 1][j + 1] = min(dp[i + 1][j + 1], dp[i + 1][j] + 1)

    return (dp[n1][n2], dp)

def solve():
    s1 = input()
    s2 = input()
    opnum, dp = (levenshtein_distance(s1, s2))
    print(opnum)

N,M=map(int,input().split())
solve()


lcslongest common subsequencealgorithmdynamic programming Share Tweet