I have this large print file in pdf that's contains 5544 pages and is about 36mb in size. The file is created by MS Word 2010 and contains only text and a logo on each letter/document.
I split it into 5544 files and merge back into 2770 letters, based on keywords. Each letter is approx. 140-145kb.
When I merge all the letters into a new pdf print file, still containing 5544 pages, the size of the file is grown to 396mb.
All text extracting, splitting and merging is performed with calls to Apache PDFBox command-line tools from PHP, but result is the same when run from a console.
Any idea how to reduce the file size of the letters and the final print file? It seems like PDFBox has just appended each letters in the final print file, instead creating a new pdf-document.
It's only in the testing phase that all the documents are merged into the final print file, some of the documents will be send by email.
I have also tried SAMBox (a fork of PDFBox) but with nearly the same result:
pdfinfo Original.pdf
Title: Printfile
Author: Claus Hjort Bube
Creator: Microsoft® Word 2010
Producer: Microsoft® Word 2010
CreationDate: Fri May 19 12:16:34 2017 CEST
ModDate: Fri May 19 12:16:34 2017 CEST
Tagged: yes
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 5544
Encrypted: no
Page size: 595.32 x 841.92 pts (A4)
Page rot: 0
File size: 36092281 bytes
Optimized: no
PDF version: 1.5
pdfinfo PDFBox.pdf
Title: Printfile
Author: Claus Hjort Bube
Creator: Microsoft® Word 2010
Producer: Microsoft® Word 2010
CreationDate: Fri May 19 12:16:34 2017 CEST
ModDate: Fri May 19 12:16:34 2017 CEST
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 5544
Encrypted: no
Page size: 595.32 x 841.92 pts (A4)
Page rot: 0
File size: 396622354 bytes
Optimized: no
PDF version: 1.4
pdfinfo SAMBox.pdf
Creator: Sejda Console 3.2.17
Producer: SAMBox 1.1.8 (www.sejda.org)
ModDate: Tue Jul 11 23:34:33 2017 CEST
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 5544
Encrypted: no
Page size: 595.32 x 841.92 pts (A4)
Page rot: 0
File size: 378779436 bytes
Optimized: no
PDF version: 1.7