iText 7 Html to Pdf conversion and linking external file to the generated pdf

Question

I am encountering an issue while merging two PDFs generated out of IText. I am new to iText7 I am creating one pdf from html and creating another pdf with excel(.xls) as embedded document to pdf. I want to merge the 2 files.

Basically I want to generate a PDF from html then attach a excel document to it and then output combined html outPutStream from these two pdfs.

Below is the code I am using

    ByteArrayOutputStream htmlToPdfContent = new ByteArrayOutputStream();
    PdfWriter writer = new PdfWriter(htmlToPdfContent);
    PdfDocument pdf = new PdfDocument(writer);
    pdf.setTagged();
    PageSize pageSize = PageSize.A4.rotate();
    pdf.setDefaultPageSize(pageSize);
    ConverterProperties properties = new ConverterProperties();
    HtmlConverter.convertToPdf(htmlContent, pdf, properties);

    FileUtils.cleanDirectory(new File(outputDir));

    ByteArrayOutputStream pdfResult = new ByteArrayOutputStream();
    PdfWriter writerResult = new PdfWriter(pdfResult);
    PdfDocument pdfDocResult = new PdfDocument(writerResult);

    PdfReader reader = new PdfReader(new ByteArrayInputStream(htmlToPdfContent.toByteArray()));
    PdfDocument pdfDoc = new PdfDocument(reader);
    pdfDoc.copyPagesTo(1, pdfDoc.getNumberOfPages(), pdfDocResult);

    ByteArrayOutputStream pdfAttach = new ByteArrayOutputStream();
    PdfDocument pdfLaunch = new PdfDocument(new PdfWriter(pdfAttach));
    Rectangle rect = new Rectangle(36, 700, 100, 100);
    byte[] embeddedFileContentBytes = Files.readAllBytes(Paths.get(excelPath));
    PdfFileSpec fs = PdfFileSpec.createEmbeddedFileSpec(pdfLaunch, embeddedFileContentBytes, null, "test.xlsx", null, null);
    PdfAnnotation attachment = new PdfFileAttachmentAnnotation(rect, fs)
            .setContents("Click me");
    pdfLaunch.addNewPage().addAnnotation(attachment);

    PdfDocument appliedChanges = new PdfDocument(new PdfReader(new ByteArrayInputStream(pdfAttach.toByteArray())));

    appliedChanges.copyPagesTo(1, appliedChanges.getNumberOfPages(), pdfDocResult);
    try(OutputStream outputStream = new FileOutputStream(dest)) {
        pdfResult.writeTo(outputStream);
    }

This is throwing exception

13:56:05.724 [main] ERROR com.itextpdf.kernel.pdf.PdfReader - Error occurred while reading cross reference table. Cross reference table will be rebuilt.
com.itextpdf.io.IOException: Error at file pointer 19,272.
    at com.itextpdf.io.source.PdfTokenizer.throwError(PdfTokenizer.java:678)
    at com.itextpdf.kernel.pdf.PdfReader.readXrefSection(PdfReader.java:801)
    at com.itextpdf.kernel.pdf.PdfReader.readXref(PdfReader.java:774)
    at com.itextpdf.kernel.pdf.PdfReader.readPdf(PdfReader.java:538)
    at com.itextpdf.kernel.pdf.PdfDocument.open(PdfDocument.java:1818)
    at com.itextpdf.kernel.pdf.PdfDocument.<init>(PdfDocument.java:238)
    at com.itextpdf.kernel.pdf.PdfDocument.<init>(PdfDocument.java:221)
    at com.mediaocean.prisma.order.command.infrastructure.pdf.itext.PdfAttachmentLaunch.main(PdfAttachmentLaunch.java:76)
Caused by: com.itextpdf.io.IOException: xref subsection not found.
    ... 8 common frames omitted
Exception in thread "main" com.itextpdf.kernel.PdfException: Trailer not found.
    at com.itextpdf.kernel.pdf.PdfReader.rebuildXref(PdfReader.java:1064)
    at com.itextpdf.kernel.pdf.PdfReader.readPdf(PdfReader.java:543)
    at com.itextpdf.kernel.pdf.PdfDocument.open(PdfDocument.java:1818)
    at com.itextpdf.kernel.pdf.PdfDocument.<init>(PdfDocument.java:238)
    at com.itextpdf.kernel.pdf.PdfDocument.<init>(PdfDocument.java:221)
    at com.mediaocean.prisma.order.command.infrastructure.pdf.itext.PdfAttachmentLaunch.main(PdfAttachmentLaunch.java:88)
13:56:05.773 [main] ERROR com.itextpdf.kernel.pdf.PdfReader - Error occurred while reading cross reference table. Cross reference table will be rebuilt.
com.itextpdf.io.IOException: PDF startxref not found.
    at com.itextpdf.io.source.PdfTokenizer.getStartxref(PdfTokenizer.java:262)
    at com.itextpdf.kernel.pdf.PdfReader.readXref(PdfReader.java:753)
    at com.itextpdf.kernel.pdf.PdfReader.readPdf(PdfReader.java:538)
    at com.itextpdf.kernel.pdf.PdfDocument.open(PdfDocument.java:1818)
    at com.itextpdf.kernel.pdf.PdfDocument.<init>(PdfDocument.java:238)
    at com.itextpdf.kernel.pdf.PdfDocument.<init>(PdfDocument.java:221)
    at com.mediaocean.prisma.order.command.infrastructure.pdf.itext.PdfAttachmentLaunch.main(PdfAttachmentLaunch.java:88)

Please advise. Thanks in advance !!

mkl mkl · Accepted Answer · 2020-08-19T11:25:04

Concerning revision 2 of your question

You changed your code differently than proposed in my answer to the first revision of your question, you now convert into the formerly unused PdfDocument pdf instead of directly into the ByteArrayOutputStream htmlToPdfContent.

This actually also is a possible fix of the problem identified in that answer. Thus, you don't get an exception here anymore:

PdfReader reader = new PdfReader(new ByteArrayInputStream(htmlToPdfContent.toByteArray()));
PdfDocument pdfDoc = new PdfDocument(reader);

Instead you now get an exception further down the flow, here:

PdfDocument appliedChanges = new PdfDocument(new PdfReader(new ByteArrayInputStream(pdfAttach.toByteArray())));

And the reason is simple, you have not yet closed the PdfDocument pdfLaunch which writes to the ByteArrayOutputStream pdfAttach. But only closing finalizes the PDF in the output stream. Thus, add the close():

ByteArrayOutputStream pdfAttach = new ByteArrayOutputStream();
PdfDocument pdfLaunch = new PdfDocument(new PdfWriter(pdfAttach));
[...]
pdfLaunch.addNewPage().addAnnotation(attachment);
pdfLaunch.close(); //<==== added

PdfDocument appliedChanges = new PdfDocument(new PdfReader(new ByteArrayInputStream(pdfAttach.toByteArray())));

And you actually do the same mistake again, shortly after, you store the contents of the ByteArrayOutputStream pdfResult to outputStream without closing the PdfDocument pdfDocResult which writes to pdfResult. Thus, also add a close call there:

appliedChanges.copyPagesTo(1, appliedChanges.getNumberOfPages(), pdfDocResult);
pdfDocResult.close(); //<==== added
try(OutputStream outputStream = new FileOutputStream(dest)) {
    pdfResult.writeTo(outputStream);
}

Concerning revision 1 of your question

You use the ByteArrayOutputStream htmlToPdfContent as target of two distinct PDF generators, the PdfDocument pdf via the PdfWriter writer and the HtmlConverter.convertToPdf call:

ByteArrayOutputStream htmlToPdfContent = new ByteArrayOutputStream();
PdfWriter writer = new PdfWriter(htmlToPdfContent);
PdfDocument pdf = new PdfDocument(writer);
pdf.setTagged();
PageSize pageSize = PageSize.A4.rotate();
pdf.setDefaultPageSize(pageSize);
ConverterProperties properties = new ConverterProperties();
HtmlConverter.convertToPdf(content, htmlToPdfContent, properties);

This makes the content of htmlToPdfContent a hodgepodge of the outputs of both of them, in particular not a valid PDF.

As you don't add any content to pdf, you can safely remove it and reduce the above excerpt to

ByteArrayOutputStream htmlToPdfContent = new ByteArrayOutputStream();
ConverterProperties properties = new ConverterProperties();
HtmlConverter.convertToPdf(content, htmlToPdfContent, properties);

iText 7 Html to Pdf conversion and linking external file to the generated pdf

1 Answers

Concerning revision 2 of your question

Concerning revision 1 of your question