Class Summary |
ImageHandler |
Knows how to handle image contents |
MailHandler |
Mail handler knows how to extract mail message information, and all the text
within, including attachments |
MSOfficeHandler |
Knows how to handle Microsoft Office documents |
MSOfficeHandler.ContentPOIFSReaderListener |
|
MSWordHandler |
Knows how to handle Microsoft Office documents |
NullHandler |
The simplest implementation: just do nothing,
as the incoming input stream already is an indexable text stream. |
PDFHandler |
This filter takes a PDF input stream and generates its equivalent text input
stream. |
TaggedTextHandler |
Handles tagged text files, as XML or HTML, removing all the tags that
represent structure or format information that is not relevant to
indexation |
TextHandler |
The simplest implementation: just do nothing,
as the incoming input stream already is an indexable text stream. |