no.priv.garshol.duke.comparators
Class JaroWinklerTokenized

java.lang.Object
  extended by no.priv.garshol.duke.comparators.JaroWinklerTokenized
All Implemented Interfaces:
Comparator

public class JaroWinklerTokenized
extends Object
implements Comparator

A tokenized approach to string similarity, based on Jaccard equivalence and the Jaro-Winkler metric. FIXME: Do we actually need this, or is DiceCoefficientComparator better? I guess Dice probably is better. However, the code for not allowing same token to be matched twice is unique to this comparator. Should we reuse in Dice, or just support more methods than just Dice?


Constructor Summary
JaroWinklerTokenized()
           
 
Method Summary
 double compare(String s1, String s2)
           
 boolean isTokenized()
          Returns true if the comparator breaks string values up into tokens when comparing.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

JaroWinklerTokenized

public JaroWinklerTokenized()
Method Detail

isTokenized

public boolean isTokenized()
Description copied from interface: Comparator
Returns true if the comparator breaks string values up into tokens when comparing. Necessary because this impacts indexing of values.

Specified by:
isTokenized in interface Comparator

compare

public double compare(String s1,
                      String s2)
Specified by:
compare in interface Comparator


Copyright © 2013. All Rights Reserved.