I have an old Kindle Dx. Owing to disabilities, I can't use tablets or other touch devices, and I transfer pdfs to the Kindle to read them. It requires pre-processing.
What is a good option to pre-process pdfs without rasterizing them?
[When rasterizing is acceptable:
k2pdfopt -mode copy for maps or for small text. This rasterizes, enhances contrast, and makes everything 1.4-compatible.
k2pdfopt -mode copy -dev dx for other works. This rasterizes to 800x1080, downsamples as needed, enhances contrast while making everything grayscale, and makes everything 1.4-compatible.
When rasterizing text is not acceptable:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -sstdout=%sstderr -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf if you want to preserve graphics. This makes minimal changes to make everything 1.4 compatible.
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \ -g800x1080 -r150 -dPDFFitPage \ -dFastWebView -sColorConversionStrategy=RGB \ -dDownsampleColorImages=true -dDownsampleGrayImages=true -dDownsampleMonoImages=true -dColorImageResolution=150 -dGrayImageResolution=150 -dMonoImageResolution=300 -dColorImageDownsampleThreshold=1.0 -dGrayImageDownsampleThreshold=1.0 -dMonoImageDownsampleThreshold=1.0 \ -sstdout=%sstderr -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf if you want moderate downsampling. This re-rasterizes existing raster images to fit 800x1080 and makes everything 1.4 compatible.
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \ -g800x1080 -r150 -dPDFFitPage \ -dFastWebView -sColorConversionStrategy=Gray \ -dDownsampleColorImages=true -dDownsampleGrayImages=true -dDownsampleMonoImages=true -dColorImageResolution=75 -dGrayImageResolution=75 -dMonoImageResolution=150 -dColorImageDownsampleThreshold=1.0 -dGrayImageDownsampleThreshold=1.0 -dMonoImageDownsampleThreshold=1.0 \ -sstdout=%sstderr -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf if you want more aggressive downsampling. This re-rasterizes raster images to fit 400x540, makes them grayscale, and makes everything 1.4 compatible. Low image quality, but usually still recognizable.
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dFILTERIMAGE -dFILTERVECTOR -sstdout=%sstderr -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf if you want to cut all graphics.
If using any of these options to pre-process for another device check its screen size in pixels. Don't worry too much about pixels per inch.]
[I.S. My goals are to fix pdfs so they 1. don't crash my Kindle, 2. don't freeze my Kindle or take too long to load each page, and 3. don't take up too much of the limited disk space on my Kindle. Preferably also 4. not rasterizing text, 5. not cutting out all images, which can sometimes lose tables, etc. and 6. not reflowing text, which will generally lose tabled. But I'm happy to downsample most images.]
[I.S. Note that I'm keeping copies of the originals. This is not a way to save disk space!]
For scanned pdfs, Willus's k2pdfopt is a great option. I've set up Mac Automator for
k2opt -mode copy -dev dx
or occasionally just -mode copy.
For pdf-born-pdfs, I'd rather not rasterize everything.
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -sstdout=%stderr -dNOPAUSE -dQUIET -dBATCH
can usually convert files, so the Kindle Dx can open them, but the Kindle will still slow, freeze, or crash with some pages.
One option is to combine Ghostscript and Mutool as follows:
- gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -sstdout=%stderr -dNOPAUSE -dQUIET -dBATCH to pre-process pdfs to remove passwords,
- mutool clean -g -g -d -s -l to sort out the junk, and then
- gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -sstdout=%stderr -dNOPAUSE -dQUIET -dBATCH again to get a smaller and faster pdf.
Note: I think Mutool's 3rd -g is the equivalent of Ghostscript's -dDetectDuplicateImages. Since it slows rendering down it may be better to do the opposite. I'm not sure how to set it to false. -dDetectDuplicateImages false? -uDetectDuplicateImages?
Note: I'm using gtime to time pdf rendering.
A single-step tool in a single application would help. And an image-reduction too would also help. Ghostscript's documentation is hard to follow.
- For cleanup, as an alternative to running mutool:
-dFastWebView might help.
-dNOGC indicates that Ghostscript does garbage collection by default.
- For image reduction:
-dPDFSETTINGS=/screen seems to work better in 9.50 than 9.23. /ebook might be better since it embeds all fonts.
-dFILTERIMAGE -dFILTERVECTOR also work better in 9.50 than 9.23, but are more drastic than I'd like.
A lot of settings seem to rely in input resolution and/or input page size.
-r seems to rely on input page size, rather than output page size. The Kindle Dx is 800 pixels by 1180 pixels.
-dDownScaleFactor reduces relative to input resolution.
-g800x1080 seems to crop pages, not shrink them.
I think -sDEVICE=pdfimage8 rasterizes everything, like k2pdfopt.
In some cases
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dFastWebView -uDetectDuplicateImages -dPDFSETTINGS=/ebook -sstdout=%sstderr -dNOPAUSE -dQUIET -dBATCH yields larger and slower files than just -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -sstdout=%sstderr -dNOPAUSE -dQUIET -dBATCH
... I'm not sure what to make of these results.