0
votes

I have been trying to mergePage with PyPDF2 using the same foreground to multiple pages in multiple documents with the following loop.

for item in file_list: # loops through 16 pdf files

print("Processing " + item)

if item.endswith(".pdf"):

    output_to_file = "/Users/" + getuser() + "/Target/" + item

    background = PdfFileReader(open(source_files + item, "rb"))
    page_count = background.getNumPages()

    for n in range(page_count):

        x, y, w, h = background.getPage(n).mediaBox  # get size of mediaBox
        if w > h:
            foreground = PdfFileReader(open("b_landscape.pdf", "rb"))
        else:
            foreground = PdfFileReader(open("b_portrait.pdf", "rb"))

            input_file = background.getPage(n)
            input_file.mergePage(foreground.getPage(0))
            output.addPage(input_file)

    with open(output_to_file, "wb") as outputStream:
        output.write(outputStream)

The result is a series of pdf flies with increasing size i.e. the first file is about 6MB, and after the 16th loop the resulting file about 70MB. What seems to be happening is that the foreground image is being carried into the next loop. I have tried reinitialising the PageObject (input_file) with

input_file = None

to no avail. If anyone has a suggestion, it would be most appreciated.

1
About your code, I think that unless I am misunderstanding what you're doing, the input_file stuff should be on the same level as if and else. I don't think that's the issue you're asking about, but it's what I saw first.James C. Taylor
Thanks James. I think you hit the nail on the head! After posting,I did notice the indent problem and changed the code to include input_file.compressContentStreams() and handled the outside loop differently, where I got the result I was looking for.Pugwash
Cool. I'm going to post my comment as an answer. If you'd vote for it, I'd appreciate it.James C. Taylor

1 Answers

0
votes

About your code, I think that unless I am misunderstanding what you're doing, the input_file stuff should be on the same level as if and else. I don't think that's the issue you're asking about, but it's what I saw first.

for item in file_list: # loops through 16 pdf files

print("Processing " + item)

if item.endswith(".pdf"):

    output_to_file = "/Users/" + getuser() + "/Target/" + item

    background = PdfFileReader(open(source_files + item, "rb"))
    page_count = background.getNumPages()

    for n in range(page_count):

        x, y, w, h = background.getPage(n).mediaBox  # get size of mediaBox
        if w > h:
            foreground = PdfFileReader(open("b_landscape.pdf", "rb"))
        else:
            foreground = PdfFileReader(open("b_portrait.pdf", "rb"))

        input_file = background.getPage(n)
        input_file.mergePage(foreground.getPage(0))
        output.addPage(input_file)

    with open(output_to_file, "wb") as outputStream:
        output.write(outputStream)