no.priv.garshol.duke.comparators
Class JaroWinklerTokenized
java.lang.Object
no.priv.garshol.duke.comparators.JaroWinklerTokenized
- All Implemented Interfaces:
- Comparator
public class JaroWinklerTokenized
- extends Object
- implements Comparator
A tokenized approach to string similarity, based on Jaccard
equivalence and the Jaro-Winkler metric.
FIXME: Do we actually need this, or is DiceCoefficientComparator
better? I guess Dice probably is better. However, the code for not
allowing same token to be matched twice is unique to this comparator.
Should we reuse in Dice, or just support more methods than just Dice?
Method Summary |
double |
compare(String s1,
String s2)
|
boolean |
isTokenized()
Returns true if the comparator breaks string values up into
tokens when comparing. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
JaroWinklerTokenized
public JaroWinklerTokenized()
isTokenized
public boolean isTokenized()
- Description copied from interface:
Comparator
- Returns true if the comparator breaks string values up into
tokens when comparing. Necessary because this impacts indexing of
values.
- Specified by:
isTokenized
in interface Comparator
compare
public double compare(String s1,
String s2)
- Specified by:
compare
in interface Comparator
Copyright © 2013. All Rights Reserved.