no.priv.garshol.duke.comparators
Class WeightedLevenshtein

java.lang.Object
  extended by no.priv.garshol.duke.comparators.WeightedLevenshtein
All Implemented Interfaces:
Comparator

public class WeightedLevenshtein
extends Object
implements Comparator

An implementation of the Levenshtein distance metric that uses weights, so that not all editing operations are considered equal. Useful explanation: http://www.let.rug.nl/kleiweg/lev/levenshtein.html


Nested Class Summary
static class WeightedLevenshtein.DefaultWeightEstimator
           
static interface WeightedLevenshtein.WeightEstimator
          The object which supplies the actual weights for editing operations.
 
Constructor Summary
WeightedLevenshtein()
           
 
Method Summary
static double compactDistance(String s1, String s2, WeightedLevenshtein.WeightEstimator weight)
          Optimized version of the Wagner & Fischer algorithm that only keeps a single column in the matrix in memory at a time.
 double compare(String s1, String s2)
           
static double distance(String s1, String s2, WeightedLevenshtein.WeightEstimator weight)
           
 boolean isTokenized()
          Returns true if the comparator breaks string values up into tokens when comparing.
 void setEstimator(WeightedLevenshtein.WeightEstimator estimator)
           
static void timing(String s1, String s2)
          Utility function for testing Levenshtein performance.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WeightedLevenshtein

public WeightedLevenshtein()
Method Detail

compare

public double compare(String s1,
                      String s2)
Specified by:
compare in interface Comparator

isTokenized

public boolean isTokenized()
Description copied from interface: Comparator
Returns true if the comparator breaks string values up into tokens when comparing. Necessary because this impacts indexing of values.

Specified by:
isTokenized in interface Comparator

setEstimator

public void setEstimator(WeightedLevenshtein.WeightEstimator estimator)

distance

public static double distance(String s1,
                              String s2,
                              WeightedLevenshtein.WeightEstimator weight)

compactDistance

public static double compactDistance(String s1,
                                     String s2,
                                     WeightedLevenshtein.WeightEstimator weight)
Optimized version of the Wagner & Fischer algorithm that only keeps a single column in the matrix in memory at a time. It implements the simple cutoff, but otherwise computes the entire matrix.


timing

public static void timing(String s1,
                          String s2)
Utility function for testing Levenshtein performance.



Copyright © 2013. All Rights Reserved.