2
votes

I have some trouble to get this code working. The goal is to merge pdf with a loaded pdf in a PDDocument object. I don't want to use the mergeUtility of PdfBox because it implies to closed the PDDocument object. I have a lot of data to process and I use a loop to process it. Load and close a PDDocument will take too much time and resource (maybe I'm wrong but that the way it feel it).

Here is my way to do it :

for (String path:pathList) {
    /* ... */
    if(path.endsWith("pdf")){
        File pdfToMerge = new File(path);
        try(PDDocument pdfToMergeDocument = PDDocument.load(pdfToMerge)){
            for (int pageIndex = 0; pageIndex < pdfToMergeDocument.getNumberOfPages(); pageIndex++){
                PDPage page = pdfToMergeDocument.getPage(pageIndex);
                doc.addPage(page);
            }
        }catch (IOException e){
            System.out.println("Pdf : " + path + ANSI_RED + "  [FAILED]" + ANSI_RESET);
            continue;
        }finally {
            System.out.println("Pdf : " + path + ANSI_GREEN +"  [OK]" + ANSI_RESET);
        }
    }
}
doc.save("src/Kairos/OutPut/"+pdfName[pdfName.length - 1]+".pdf");
doc.close();

The error happen when I try to save the document, on line 65.

I get this error message :

Exception in thread "main" java.io.IOException: COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:83)
at org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStream.java:133)
at org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1214)
at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:402)
at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:521)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:459)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:443)
at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1108)
at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:449)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1381)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1268)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1334)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1305)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1293)
at Kairos.Main.main(Main.java:65)
1
@FedericoklezCulloca doc is declared at the begin of the file, it's created with the CreatePDFA class of apache's example. I check if the code work without this part and all is fine I get no error and I can save the document. The problem is really in this block. If you want I can edit my post to add the full code.Hugo Chittaro
To summarize the answer: close pdfToMergeDocument only after saving doc.Tilman Hausherr
@TilmanHausherr please re-read my answer below. There's a loop in there, which closes doc at the end of the first iteration. On the second iteration save fails because doc is closed.Federico klez Culloca
Oops yes, indeed. (But my comment may still apply)Tilman Hausherr
@TilmanHausherr indeed. I just re-read the documentation for the PDDocument::addPage method. It doesn't make it clear, but it does not make a copy. I'll amend my answer with a solution to this later. Thanks for your commentsFederico klez Culloca

1 Answers

1
votes

Consider this: you have a list of Strings in pathList and you iterate through it.

At the end of the first loop you save doc and you close it.

Then you loop again and try to save doc. Which you closed in the previous iteration.

If your objective is to put the contents of all the pdfs in pathList inside the pdf pointed to by doc, you have to close it outside the loop, after you looped over all of pathList.

EDIT:

As pointed out by Tilman Hausherr, there's another problem. When you call addPage you're not making a copy of the original page, you're more or less linking to it. Since you're using a try-with-resources construct, the original file gets closed at the end of the try-catch construct, meaning that, as soon as you exit the construct, you lose any reference to the original page. So you have to save before exiting the try-catch or you use importPage instead, which makes a copy (and will then call addPage anyway). So

PDPage page = pdfToMergeDocument.getPage(pageIndex);
doc.importPage(page);

EDIT 2:

Of course this answer is now wrong because OP posted the wrong code in the original question :) I'll leave this here in case anyone needs it.