com.languagecomputer.api
Interface DocumentWebService

All Superinterfaces:
BaseWebService
All Known Implementing Classes:
DocumentWebServiceImpl

public interface DocumentWebService
extends BaseWebService

Interface for document preprocessing web service. Supports two types of file processing: paths and content. Paths are processed as URLs, while content is passed as a String. The file type can be specified as a parameter or can be automatically detected. The service returns a processed document.

Since:
1.0
Author:
Toby Jungen / Kirk Roberts

Method Summary
 String process(String sessionID, String url, String type, String format)
          Processes a document at a specific URL by downloading it and ingesting it.
 String processContent(String sessionID, byte[] content, String type, String format)
          Processes a document with the given content.
 
Methods inherited from interface com.languagecomputer.api.BaseWebService
registerSession, unregisterSession
 

Method Detail

process

String process(String sessionID,
               String url,
               String type,
               String format)
               throws Exception
Processes a document at a specific URL by downloading it and ingesting it. Strips extraneous markup and normalizes the text.

Parameters:
sessionID - The unique identifier String for the session for this operation.
url - The URL of the document to be processed.
type - The type of document specified in the url. Defaults to AUTO
format - The format in which the document should be returned. Defaults to XML.
Returns:
The processed document in the format requested.
Throws:
Exception - If there was an error processing the document.

processContent

String processContent(String sessionID,
                      byte[] content,
                      String type,
                      String format)
                      throws Exception
Processes a document with the given content. Strips extraneous markup and normalizes the text.

Parameters:
sessionID - The unique identifier String for the session for this operation.
content - The document content to process.
type - The type of document specified in the url. Defaults to AUTO
format - The format in which the document should be returned. Defaults to XML.
Returns:
The processed document in the format requested.
Throws:
Exception - If there was an error processing the document.


Copyright © 2009. All Rights Reserved.