com.languagecomputer.api.text
Interface Text

All Known Subinterfaces:
Answer, Attribute, Document, Entity, Event, MentionChain, QuestionAnswerPair, SpatialSpan, TemporalSpan
All Known Implementing Classes:
DefaultAnswer, DefaultAttribute, DefaultDocument, DefaultEntity, DefaultEvent, DefaultMentionChain, DefaultQuestionAnswerPair, DefaultSpatialSpan, DefaultTemporalSpan, DefaultText, GenericDefaultText

public interface Text

Superclass of all unstructured text representation objects. Provides access to offsets, the Document, and surrounding Text objects.

Since:
1.0
Author:
Kirk Roberts

Method Summary
 AnnotationType getAnnotationType()
          Returns the AnnotationType that describes this Text.
<T extends Text>
Collection<T>
getCongruentAnnotations(AnnotationType<T> type)
          Returns all the annotations congruent with this Text that correspond to the given AnnotationType.
 Document getDocument()
          Returns the Document in which this Text exists.
 String getDocumentID()
          Returns the Document ID that identifies this Text.
 int getEndCharOffset()
          Returns the (exclusive) end character offset for this Text object within the (processed) Document.
<T extends Text>
Collection<T>
getIntersectingAnnotations(AnnotationType<T> type)
          Returns all the annotations that intersect this Text and correspond to the given AnnotationType.
 String getRawString()
          Returns the raw String value that this Text spans.
 int getStartCharOffset()
          Returns the start character offset for this Text object within the (processed) Document.
<T extends Text>
Collection<T>
getSubAnnotations(AnnotationType<T> type)
          Returns all the annotations within this Text that correspond to the given AnnotationType.
<T extends Text>
Collection<T>
getSuperAnnotations(AnnotationType<T> type)
          Returns all the annotations that contain this Text as a subspan and correspond to the given AnnotationType.
 

Method Detail

getAnnotationType

AnnotationType getAnnotationType()
Returns the AnnotationType that describes this Text.

Returns:
The AnnotationType of this Text.
See Also:
AnnotationType

getDocumentID

String getDocumentID()
Returns the Document ID that identifies this Text.

Returns:
The ID for the Document that contains this Text.

getDocument

Document getDocument()
Returns the Document in which this Text exists.

Returns:
The Document that contains this Text.

getStartCharOffset

int getStartCharOffset()
Returns the start character offset for this Text object within the (processed) Document. This offset does not necessarily line up with the start character offset from the unprocessed document/file.

Returns:
The inclusive start offset.

getEndCharOffset

int getEndCharOffset()
Returns the (exclusive) end character offset for this Text object within the (processed) Document. This offset does not necessarily line up with the start character offset from the unprocessed document/file.

Returns:
The exclusive end offset.

getRawString

String getRawString()
Returns the raw String value that this Text spans. Includes whitespace (spaces, tabs, newlines, etc) but not the original markup (html tags, pdf and MS word markup, etc). Might contain additional newlines along paragraph boundaries for some large documents.

Returns:
The raw string with no mark-up.

getCongruentAnnotations

<T extends Text> Collection<T> getCongruentAnnotations(AnnotationType<T> type)
Returns all the annotations congruent with this Text that correspond to the given AnnotationType.

Type Parameters:
T - The Text sub-class corresponding to the given AnnotationType.
Parameters:
type - The AnnotationType that all returned items will match.
Returns:
A Collection of objects matching the given AnnotationType. The annotations will be in a semi-sorted order. This means that non-intersecting objects will be sorted by their order in the document. No guarantee will be placed on the order of intersecting objects.

getSubAnnotations

<T extends Text> Collection<T> getSubAnnotations(AnnotationType<T> type)
Returns all the annotations within this Text that correspond to the given AnnotationType.

Type Parameters:
T - The Text sub-class corresponding to the given AnnotationType.
Parameters:
type - The AnnotationType that all returned items will match.
Returns:
A Collection of objects matching the given AnnotationType. The annotations will be in a semi-sorted order. This means that non-intersecting objects will be sorted by their order in the document. No guarantee will be placed on the order of intersecting objects.

getSuperAnnotations

<T extends Text> Collection<T> getSuperAnnotations(AnnotationType<T> type)
Returns all the annotations that contain this Text as a subspan and correspond to the given AnnotationType.

Type Parameters:
T - The Text sub-class corresponding to the given AnnotationType.
Parameters:
type - The AnnotationType that all returned items will match.
Returns:
A Collection of objects matching the given AnnotationType. The annotations will be in a semi-sorted order. This means that non-intersecting objects will be sorted by their order in the document. No guarantee will be placed on the order of intersecting objects.

getIntersectingAnnotations

<T extends Text> Collection<T> getIntersectingAnnotations(AnnotationType<T> type)
Returns all the annotations that intersect this Text and correspond to the given AnnotationType.

Type Parameters:
T - The Text sub-class corresponding to the given AnnotationType.
Parameters:
type - The AnnotationType that all returned items will match.
Returns:
A Collection of objects matching the given AnnotationType. The annotations will be in a semi-sorted order. This means that non-intersecting objects will be sorted by their order in the document. No guarantee will be placed on the order of intersecting objects.


Copyright © 2009. All Rights Reserved.