Crop a pdf page to content

Question

Using Python, is it possible to crop a pdf page to the content as shown in the image below where the task is achieved in Inkscape? The bounding area for the content should be found automatically.

Using PyPDF2 I can crop the page, but it requires the coordinates to be manually found, which is tedious for a large number of files. In Inkscape, the coordinates are automatically found.

The code I'm using is shown below and an example input file is available here.

# Python 3.7.0
import PyPDF2 # version 1.26.0

with open('document-1.pdf','rb') as fin:
    pdf = PyPDF2.PdfFileReader(fin)
    page = pdf.getPage(0)

    # Coordinates found by inspection.
    # Can these coordinates be found automatically?
    page.cropBox.lowerLeft=(88,322)
    page.cropBox.upperRight = (508,602)

    output = PyPDF2.PdfFileWriter()
    output.addPage(page)

    with open('cropped-1.pdf','wb') as fo:
        output.write(fo)

StevenClontz StevenClontz · Accepted Answer · 2020-07-04T15:44:46

I was able to do this with the pip-installable CLI https://pypi.org/project/pdfCropMargins/

Since I originally answered, a Python interface has been added: https://github.com/abarker/pdfCropMargins#python-interface (h/t @Paul)

My original answer calling it from the commandline is below.

Unfortunately, I don't believe there's a great way to call it directly from a script, so for now I'm using os.system.

$ python -m pip install pdfCropMargins --user
$ pdf-crop-margins document.pdf -o output.pdf -p 0

import os
os.system('pdf-crop-margins document.pdf -o output.pdf -p 0')

Crop a pdf page to content

1 Answers