com.languagecomputer.api.text
Class TextComparator
java.lang.Object
com.languagecomputer.api.text.TextComparator
- All Implemented Interfaces:
- Comparator<Text>
public class TextComparator
- extends Object
- implements Comparator<Text>
A recommended sorting Comparator for Text objects that
occur in the same Document. Doing a strict ordering of
Texts is often not possible, spans can overlap in numerous
different ways. However, sorting can greatly benefit performance over using
an unsorted Collection. Some applications, furthermore, require
sorting in order to present results to the user. As a compromise, the
TextComparator sorts by:
- Start offset, from lowest to highest
- End offset, from highest to lowest
For example, given the sentence:
King Richard I of England was known as the Lionheart.
Then the following represents the order in which text spans would be ordered,
if given just these spans:
King Richard I
Richard I of England
Richard I
England
Lionheart
This sort ordering was chosen as a priority sort for eliminating overlapping
spans. In other words, if a system cannot deal with overlapping spans, then
one way to determine which should be kept is by iterating along a list that
uses this sort, and choosing items that do not intersect with any of the
previously chosen items. From the example above, this would yield the
following items:
King Richard I
England
Lionheart
One sort ordering and one non-intersecting span selection algorithm clearly
do not fit all needs. So this class is merely provided as an aide to the
user of one way this may be done.
NOTE: Again, this Comparator assumes that all of the
Texts to be compared will exist within the same Document.
Attempting to use the TextComparator with Texts
from different Documents will result in a
IllegalArgumentException being throw. Specifically the
Text.getDocument() method must return the exact same object for
all compared spans.
- Since:
- 1.0
- Author:
- Kirk Roberts
|
Method Summary |
int |
compare(Text text1,
Text text2)
Compares the two Text objects according to the sorting algorithm
described above. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
INSTANCE
public static final TextComparator INSTANCE
TextComparator
public TextComparator()
compare
public int compare(Text text1,
Text text2)
- Compares the two
Text objects according to the sorting algorithm
described above.
- Specified by:
compare in interface Comparator<Text>
- Parameters:
text1 - The first Text to compare.text2 - The second Text to compare.
- Returns:
- A number less than
0 if text1 should be
ordered before text2, greater than 0 if it
should be ordered after text2, or 0 if they are
equivalent and their relative order is not important.
Copyright © 2009. All Rights Reserved.