I have a problem with 'XWPFDocument'. My part of the program gets 'docx' files and copy all content from them to one output 'docx' file. Include text, tables, pictures and formula. And I have a good result in this, but lately I got a bug: one picture was not copied into the result. this is source and this is result In result you can see what images in part "3.1.6.2" was successfully copied, but not in "3.1.6.1".
And there is how i do it:
for (XWPFRun run : oldParagraph.getRuns()) {
XWPFRun newRun = newParagraph.createRun()
if (run.getText(0) != null && !run.getText(0).isEmpty()) {
.... copy text ....
}
if (run.getEmbeddedPictures() != null && run.getEmbeddedPictures().size() > 0) {
for (XWPFPicture pic : run.getEmbeddedPictures()) {
byte[] img = pic.getPictureData().getData()
long cx = pic.getCTPicture().getSpPr().getXfrm().getExt().getCx()
long cy = pic.getCTPicture().getSpPr().getXfrm().getExt().getCy()
int pictureType = pic.getPictureData().getPictureType()
XWPFDocument document = newParagraph.getDocument()
String blipId = document.addPictureData(new ByteArrayInputStream(img), pictureType)
createPictureCxCy(document, blipId, document.getNextPicNameNumber(pictureType), cx, cy)
}
}
}
The key point here is:
for (XWPFPicture pic : run.getEmbeddedPictures())
I am getting embedded pictures from 'run'. In bad file i have 5 'paragraphs' with 1 'run' inside each, 4 of them have a text, and 1 is empty. Usually, exactly this empty 'run' has embedded picture, and judging by the order the picture should be here. Now it empty at all. But in XWPFDocument this picture exist, in list of 'pictures' and 'packagePictures'.
The problem: this list have 'XWPFPictureData' objects, witch not contain information about location in the document and picture scales. But 'run.getEmbeddedPictures()' contains 'XWPFPicture' - what do we need. Is there any way out of this situation?
Update for the first comment.
I checked:
for(XWPFParagraph paragraph: document.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
println "run text: " + run.getText(0)
println "embedded picture count: " + run.getEmbeddedPictures().size()
}
}
println "*** for document picture count: " + document.allPictures.size()
Result was:
run text: 3.1.6.1 В ряде районов сейсмические нагрузки на СПБУ ...
embedded picture count: 0
run text: Интегральное сейсмическое воздействие на СПБУ ...
embedded picture count: 0
run text: null
embedded picture count: 0
run text: Рис. 3.1.6.1 Обообщенный коэффициент динамичности: ...
embedded picture count: 0
run text: Р01 — низшая частота горизонтальных колебаний
embedded picture count: 0
*** for document picture count: 4
I no idea why picture count is 4. And second, about anchor. I did not find it. Moreover, I did not find it and others - the right files. In one article I read: "Objects can be placed in your document in two ways: either inline or floating." - and only floating object have anchor.
Word
file inMicrosoft Word
. Select the missing picture. Look whether you will see a little anchor symbol somewhere. This points to the paragraph with the run, the picture is anchored on. – Axel Richter