Files
hello-algo/en/docs/chapter_dynamic_programming/edit_distance_problem.md
Yudong Jin b01036b09e Revisit the English version (#1885)
* Update giscus scroller.

* Refine English docs and landing page

* Sync the headings.

* Update landing pages.

* Update the avatar

* Update Acknowledgements

* Update landing pages.

* Update contributors.

* Update

* Fix the formula formatting.

* Fix the glossary.

* Chapter 6. Hashing

* Remove Chinese chars.

* Fix headings.

* Update giscus themes.

* fallback to default giscus theme to solve 429 many requests error.

* Add borders for callouts.

* docs: sync character encoding translations

* Update landing page media layout and i18n
2026-04-10 23:03:03 +08:00

7.0 KiB

Edit Distance Problem

Edit distance, also known as Levenshtein distance, refers to the minimum number of edits required to transform one string into another, commonly used in information retrieval and natural language processing to measure the similarity between two sequences.

!!! question

Given two strings $s$ and $t$, return the minimum number of edits required to transform $s$ into $t$.

You can perform three types of edit operations on a string: insert a character, delete a character, or replace a character with any other character.

As shown in the figure below, transforming kitten into sitting requires 3 edits, including 2 replacements and 1 insertion; transforming hello into algo requires 3 steps, including 2 replacements and 1 deletion.

Example data for edit distance

The edit distance problem can be naturally explained using the decision tree model. Strings correspond to tree nodes, and each edit operation corresponds to an edge in the tree.

As shown in the figure below, without restricting operations, each node can branch into many edges, with each edge corresponding to one operation, meaning there are many possible paths to transform hello into algo.

From the perspective of the decision tree, the goal of this problem is to find the shortest path between node hello and node algo.

Representing edit distance problem based on decision tree model

Dynamic Programming Approach

Step 1: Think about the decisions in each round, define the state, and thus obtain the dp table

Each round of decision involves performing one edit operation on string s.

We want the problem size to gradually decrease during the editing process so that we can construct subproblems. Let the lengths of strings s and t be n and m respectively. We first consider the tail characters of the two strings, s[n-1] and t[m-1].

  • If s[n-1] and t[m-1] are the same, we can skip them and directly consider s[n-2] and t[m-2].
  • If s[n-1] and t[m-1] are different, we need to perform one edit on s (insert, delete, or replace) to make the tail characters of the two strings the same, allowing us to skip them and consider a smaller-scale problem.

In other words, each round of decision (edit operation) we make on string s will change the remaining characters to be matched in s and t. Therefore, the state is the $i$-th and $j$-th characters currently being considered in s and t, denoted as [i, j].

State [i, j] corresponds to the subproblem: the minimum number of edits required to change the first i characters of s into the first j characters of $t$.

From this, we obtain a two-dimensional dp table of size (i+1) \times (j+1).

Step 2: Identify the optimal substructure, and then derive the state transition equation

Consider subproblem dp[i, j], where the tail characters of the corresponding two strings are s[i-1] and t[j-1], which can be divided into the three cases shown in the figure below based on different edit operations.

  1. Insert t[j-1] after s[i-1], then the remaining subproblem is dp[i, j-1].
  2. Delete s[i-1], then the remaining subproblem is dp[i-1, j].
  3. Replace s[i-1] with t[j-1], then the remaining subproblem is dp[i-1, j-1].

State transition for edit distance

Based on the above analysis, we obtain the optimal substructure: the minimum number of edits for dp[i, j] equals the minimum of dp[i, j-1], dp[i-1, j], and dp[i-1, j-1], plus the current edit cost of 1. The corresponding state transition equation is:


dp[i, j] = \min(dp[i, j-1], dp[i-1, j], dp[i-1, j-1]) + 1

Please note that when s[i-1] and t[j-1] are the same, no edit is required for the current character, in which case the state transition equation is:


dp[i, j] = dp[i-1, j-1]

Step 3: Determine boundary conditions and state transition order

When both strings are empty, the number of edit steps is 0, i.e., dp[0, 0] = 0. When s is empty but t is not, the minimum number of edit steps equals the length of t, i.e., the first row dp[0, j] = j. When s is not empty but t is empty, the minimum number of edit steps equals the length of s, i.e., the first column dp[i, 0] = i.

Observing the state transition equation, the solution dp[i, j] depends on solutions to the left, above, and upper-left, so the entire dp table can be traversed in order through two nested loops.

Code Implementation

[file]{edit_distance}-[class]{}-[func]{edit_distance_dp}

As shown in the figure below, the state transition process for the edit distance problem is very similar to that of the knapsack problem; both can be viewed as the process of filling a two-dimensional grid.

=== "<1>" Dynamic programming process for edit distance

=== "<2>" edit_distance_dp_step2

=== "<3>" edit_distance_dp_step3

=== "<4>" edit_distance_dp_step4

=== "<5>" edit_distance_dp_step5

=== "<6>" edit_distance_dp_step6

=== "<7>" edit_distance_dp_step7

=== "<8>" edit_distance_dp_step8

=== "<9>" edit_distance_dp_step9

=== "<10>" edit_distance_dp_step10

=== "<11>" edit_distance_dp_step11

=== "<12>" edit_distance_dp_step12

=== "<13>" edit_distance_dp_step13

=== "<14>" edit_distance_dp_step14

=== "<15>" edit_distance_dp_step15

Space Optimization

Since dp[i, j] depends on the states above dp[i-1, j], to the left dp[i, j-1], and at the upper-left dp[i-1, j-1], forward traversal will lose the upper-left state dp[i-1, j-1], while reverse traversal cannot construct dp[i, j-1] in advance, so neither traversal order is suitable.

For this reason, we can use a variable leftup to temporarily store the upper-left solution dp[i-1, j-1], so we only need to consider the solutions to the left and above. This situation is the same as in the unbounded knapsack problem, so we can use forward traversal. The code is as follows:

[file]{edit_distance}-[class]{}-[func]{edit_distance_dp_comp}