0
votes

I am processing a word 2010 Document using Docx4J. I want to print the page number and line number of String which i am searching for.

For Example:

My Document is having String called hello at Page 2, Page 6. My o/p should be like this.

Hello found at Page 2 - Line 4, Hello found at Page 6 - Line 6.

I tried to do but i failed.

I was able to Highlight that Text and able to comment by travelling across the document. But I failed to get its line number and page number.

Note : 1)There are two blank Pages Page 3 and Page 4. 2)There are No paragraphs at the end and starting of pages.

1

1 Answers

0
votes

What you're after is a page layout model, because in order to accurately determine what page or line something is on, you need to consider the mapped paper size, headers, footers, gutters, margins, font and sizing, line-height, and so on and so forth.

This is not something that docx4j has, so you'd need to come up with something (a basic word count would be simpler: you can count all the Text objects in a document for example).

One approach may be to consider how the PDF rendering functionality in docx4j works. Take a look at the FOP rendering code in docx4j, which may offer some clues re mapping pages at least:

https://github.com/plutext/docx4j/tree/master/src/main/java/org/docx4j/convert/out/fo