Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Supported by

Any way of calculating the Levensthein distance

Hi

I'm fairly new to JASP and statistics, so please bear with me. :)

I'm trying to find a way of calculating either the Levenshtein distance or the Damerau-Levenshtein distance in JASP. Does that exist?

I know that R has this (the stringdist() function). The stringdist() function apparently takes two strings as arguments and returns the Levenshtein distance between them. Would it be possible to apply this to JASP in the R console? I've never used this myself, so will have to figure out how all that works.

Many thanks!

Comments

  • Hi IamAnna,

    I doubt we have this. Would be a good feature request on our GitHub page (for details see https://jasp-stats.org/2018/03/29/request-feature-report-bug-jasp/). But perhaps the R console option also works -- I'll ask the team.

    EJ

  • We don't have the R package stringdist available so that unfortunately won't work. Nevertheless, I asked chatGPT and it suggested the following:

    levenshtein_distance <- function(str1, str2) {
     len1 <- nchar(str1)
     len2 <- nchar(str2)
    
     # Create a matrix to store the distances
     dist_matrix <- matrix(0, nrow = len1 + 1, ncol = len2 + 1)
    
     # Initialize the first row and first column of the matrix
     dist_matrix[1,] <- 0:(len2)
     dist_matrix[,1] <- 0:(len1)
    
     # Fill in the rest of the matrix
     for (i in 2:(len1 + 1)) {
       for (j in 2:(len2 + 1)) {
         cost <- ifelse(substr(str1, i - 1, i - 1) != substr(str2, j - 1, j - 1), 1, 0)
         dist_matrix[i, j] <- min(dist_matrix[i - 1, j] + 1, dist_matrix[i, j - 1] + 1, dist_matrix[i - 1, j - 1] + cost)
       }
     }
    
     # The Levenshtein distance is the value in the bottom-right cell of the matrix
     return(dist_matrix[len1 + 1, len2 + 1])
    }
    

    If you try something like

    levenshtein_distance("kitten", "sitting")
    

    then the result is 3, which matches Wikipedia. This is probably not a very optimized way of computing the Levenshtein distance, and I would also double-check its results with a few more example strings. But at least it works in the R console!

Sign In or Register to comment.