Levenshtein distance (or edit distance) between two strings is the number of deletions, insertions, or substitutions required to transform source string into target string. For example, if the source string is "book" and the target string is "back," to transform "book" to "back," you will need to change first "o" to "a," second "o" to "c," without additional deletions and insertions. Thus, the Levenshtein distance between "book" and "back" will be 2.
More information about the Levenshtein distance algorithm and its applications can be found below the calculator.
Levenshtein Distance Algorithms and Applications
The Levenshtein distance between two strings a, b (of length |a| and |b| respectively) is given by lev(a,b) where
The Levenshtein distance is named after the Russian scientist Vladimir Levenshtein, who devised the metric in 1965. There are several algorithms to compute the Levenshtein distance:
- Recursive; the straightforward algorithm, which follows the definition
- Iterative with full matrix; the one used in the calculator above
- Iterative with two matrix rows
More details and pseudocode implementations for all algorithms can be found in Wikipedia article Levenshtein distance
It has been shown1 that the Levenshtein distance cannot be computed in strongly subquadratic time, which makes its usage for comparing long strings impractical, because computation cost will be proportional to the product of string lengths. However, the edit distance can be used to find matches of a short string, for example, taken from the dictionary, in a long string. This is useful for spell checkers, correction systems for optical character recognition, and similar products.
Backurs, Arturs; Indyk, Piotr (2015). Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false). Forty-Seventh Annual ACM on Symposium on Theory of Computing (STOC). ↩