0
votes

I compare 2 pdf files and mark highlight on them. When i using pdfbox to merge it for comparison . It have error missing highlight.

combine pdfs

I using this function: The function to merge 2 file pdfs with all pages of them to side by side.

function void generateSideBySidePDF() {
    File pdf1File = new File(FILE1_PATH);
    File pdf2File = new File(FILE2_PATH);
    File outPdfFile = new File(OUTFILE_PATH);
    PDDocument pdf1 = null;
    PDDocument pdf2 = null;
    PDDocument outPdf = null;
    try {

        pdf1 = PDDocument.load(pdf1File);
        pdf2 = PDDocument.load(pdf2File);

        outPdf = new PDDocument();
        for(int pageNum = 0; pageNum < pdf1.getNumberOfPages(); pageNum++) {
            // Create output PDF frame
            PDRectangle pdf1Frame = pdf1.getPage(pageNum).getCropBox();
            PDRectangle pdf2Frame = pdf2.getPage(pageNum).getCropBox();
            PDRectangle outPdfFrame = new PDRectangle(pdf1Frame.getWidth()+pdf2Frame.getWidth(), Math.max(pdf1Frame.getHeight(), pdf2Frame.getHeight()));

            // Create output page with calculated frame and add it to the document
            COSDictionary dict = new COSDictionary();
            dict.setItem(COSName.TYPE, COSName.PAGE);
            dict.setItem(COSName.MEDIA_BOX, outPdfFrame);
            dict.setItem(COSName.CROP_BOX, outPdfFrame);
            dict.setItem(COSName.ART_BOX, outPdfFrame);
            PDPage outPdfPage = new PDPage(dict);
            outPdf.addPage(outPdfPage);

            // Source PDF pages has to be imported as form XObjects to be able to insert them at a specific point in the output page
            LayerUtility layerUtility = new LayerUtility(outPdf);
            PDFormXObject formPdf1 = layerUtility.importPageAsForm(pdf1, pageNum);
            PDFormXObject formPdf2 = layerUtility.importPageAsForm(pdf2, pageNum);

            // Add form objects to output page
            AffineTransform afLeft = new AffineTransform();
            layerUtility.appendFormAsLayer(outPdfPage, formPdf1, afLeft, "left" + pageNum);
            AffineTransform afRight = AffineTransform.getTranslateInstance(pdf1Frame.getWidth(), 0.0);
            layerUtility.appendFormAsLayer(outPdfPage, formPdf2, afRight, "right" + pageNum);
        }
        outPdf.save(outPdfFile);
        outPdf.close();

    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            if (pdf1 != null) pdf1.close();
            if (pdf2 != null) pdf2.close();
            if (outPdf != null) outPdf.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
1
The highlights could be annotations. These are ignored IIRC. IMHO it wouldn't make much sense to keep these. - Tilman Hausherr
@TilmanHausherr I am using PDFBox 2.0.8. Do you know how to fix this case? - devil8910
@TilmanHausherr this is a sample of files. It has highlight but after merge 2 files pdf. It is missing. drive.google.com/file/d/1rJ6iql8oRrFo0gsha41ZwNuwcykmWzlk/… - devil8910
I just ran it using that single file as sources for both and the highlights are there. However I used the development trunk. I retested with 2.0.11 and it worked too. I tried with 2.0.8 and it worked too, the green stuff is highlighted. Here's the result file: filedropper.com/new-result - Tilman Hausherr
Indeed, that one - as I suspected in my first comment - has annotations. The file you sent did not. It is kindof weird that you have sent me a different file than the one you are testing with. I suspect you "printed" the page into a new PDF, and by this, the annotations became normal content. So a quick solution for you would be to do that for the entire file before running your tool. A more difficult solution would be to copy all the annotations and reposition the ones from the right page (by modifiying the annotation rectangle). - Tilman Hausherr

1 Answers

1
votes

Insert this into your code after the "Source PDF pages has to be imported" segment to copy the annotations. The ones of the right PDF must have their rectangle moved.

// copy annotations
PDPage src1Page = pdf1.getPage(pageNum);
PDPage src2Page = pdf2.getPage(pageNum);
for (PDAnnotation ann : src1Page.getAnnotations())
{
    outPdfPage.getAnnotations().add(ann);                
}
for (PDAnnotation ann : src2Page.getAnnotations())
{
    PDRectangle rect = ann.getRectangle();
    ann.setRectangle(new PDRectangle(rect.getLowerLeftX() + pdf1Frame.getWidth(), rect.getLowerLeftY(), rect.getWidth(), rect.getHeight()));
    outPdfPage.getAnnotations().add(ann);                
}

Note that this code has a flaw - it works only with annotations WITH appearance stream (most have it). It will have weird effects for those that don't, in that case, one would have to adjust the coordinates depending on the annotation type. For highlights, it would be the quadpoints, for line it would be the line coordinates, etc, etc.