|
|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.itextpdf.text.pdf.parser.LocationAwareTextExtractingPdfContentRenderListener
public class LocationAwareTextExtractingPdfContentRenderListener
Development preview - this class (and all of the parser classes) are still experiencing
heavy development, and are subject to change both behavior and interface.
A text extraction renderer that keeps track of relative position of text on page
The resultant text will be relatively consistent with the physical layout that most
PDF files have on screen.
This renderer keeps track of the orientation and distance (both perpendicular
and parallel) to the unit vector of the orientation. Text is ordered by
orientation, then perpendicular, then parallel distance. Text with the same
perpendicular distance, but different parallel distance is separated by tab characters.
If text is relatively close to each other on the same line (within 4 space widths), the text
is kept together (separated with a single space).
This renderer also uses a simple strategy based on the font metrics to determine if
a blank space should be inserted into the output.
| Nested Class Summary | |
|---|---|
private static class |
LocationAwareTextExtractingPdfContentRenderListener.LocationOnPage
Represents a chunk of text, it's orientation, and location relative to the orientation vector |
| Field Summary | |
|---|---|
private Vector |
chunkEnd
the most recent ending point of the current chunk of text |
private Vector |
chunkStart
the starting point of the current line of text |
private StringBuffer |
chunkText
contains the text accumulated so far for the current chunk |
(package private) static boolean |
DUMP_STATE
set to true for debugging |
(package private) boolean |
firstRender
whether the operation is the first render of the page |
private List<LocationAwareTextExtractingPdfContentRenderListener.LocationOnPage> |
locationalResult
a summary of all found text |
| Constructor Summary | |
|---|---|
LocationAwareTextExtractingPdfContentRenderListener()
Creates a new text extraction renderer. |
|
| Method Summary | |
|---|---|
void |
beginTextBlock()
Called when a new text block is beginning (i.e. |
private void |
captureChunk(String text)
Captures the specified text as a single, cohesive chunk of text using the current line start and end information |
private void |
dumpState()
Used for debugging only |
void |
endTextBlock()
Called when a text block has ended (i.e. |
String |
getResultantText()
Returns the result so far. |
void |
renderImage(ImageRenderInfo renderInfo)
no-op method - this renderer isn't interested in image events |
void |
renderText(TextRenderInfo renderInfo)
Captures text using a relatively advanced algorithm for determining text chunks and spaces |
void |
reset()
Resets the internal state |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
static boolean DUMP_STATE
private Vector chunkStart
private Vector chunkEnd
private StringBuffer chunkText
private List<LocationAwareTextExtractingPdfContentRenderListener.LocationOnPage> locationalResult
boolean firstRender
| Constructor Detail |
|---|
public LocationAwareTextExtractingPdfContentRenderListener()
| Method Detail |
|---|
public void reset()
reset in interface RenderListenerRenderListener.reset()public void beginTextBlock()
RenderListener
beginTextBlock in interface RenderListenerRenderListener.beginTextBlock()public void endTextBlock()
RenderListener
endTextBlock in interface RenderListenerRenderListener.endTextBlock()public String getResultantText()
getResultantText in interface TextProvidingRenderListenerprivate void dumpState()
public void renderText(TextRenderInfo renderInfo)
renderText in interface RenderListenerrenderInfo - render infoprivate void captureChunk(String text)
text - public void renderImage(ImageRenderInfo renderInfo)
renderImage in interface RenderListenerrenderInfo - information specifying what to renderRenderListener.renderImage(com.itextpdf.text.pdf.parser.ImageRenderInfo)
|
Hosted by Hostbasket | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||