Any way of calculating the Levensthein distance
Hi
I'm fairly new to JASP and statistics, so please bear with me. :)
I'm trying to find a way of calculating either the Levenshtein distance or the Damerau-Levenshtein distance in JASP. Does that exist?
I know that R has this (the stringdist() function). The stringdist() function apparently takes two strings as arguments and returns the Levenshtein distance between them. Would it be possible to apply this to JASP in the R console? I've never used this myself, so will have to figure out how all that works.
Many thanks!
Comments
Hi IamAnna,
I doubt we have this. Would be a good feature request on our GitHub page (for details see https://jasp-stats.org/2018/03/29/request-feature-report-bug-jasp/). But perhaps the R console option also works -- I'll ask the team.
EJ
We don't have the R package stringdist available so that unfortunately won't work. Nevertheless, I asked chatGPT and it suggested the following:
levenshtein_distance <- function(str1, str2) { len1 <- nchar(str1) len2 <- nchar(str2) # Create a matrix to store the distances dist_matrix <- matrix(0, nrow = len1 + 1, ncol = len2 + 1) # Initialize the first row and first column of the matrix dist_matrix[1,] <- 0:(len2) dist_matrix[,1] <- 0:(len1) # Fill in the rest of the matrix for (i in 2:(len1 + 1)) { for (j in 2:(len2 + 1)) { cost <- ifelse(substr(str1, i - 1, i - 1) != substr(str2, j - 1, j - 1), 1, 0) dist_matrix[i, j] <- min(dist_matrix[i - 1, j] + 1, dist_matrix[i, j - 1] + 1, dist_matrix[i - 1, j - 1] + cost) } } # The Levenshtein distance is the value in the bottom-right cell of the matrix return(dist_matrix[len1 + 1, len2 + 1]) }If you try something like
levenshtein_distance("kitten", "sitting")then the result is 3, which matches Wikipedia. This is probably not a very optimized way of computing the Levenshtein distance, and I would also double-check its results with a few more example strings. But at least it works in the R console!