1
votes

I am working on a PDF printing processor, utilizing PDFsharp & MigraDoc. I am generating merged PDFs, containing between 2,000 and 10,000 pages. The printing vendor that is printing the PDFs generated from this program is complaining about the file sizes and the amount of time that it takes to process the PDFs because of all the embedded fonts. I viewed the embedded fonts in Adobe Acrobat Reader DC and can see that there are tons of fully embedded fonts and subsets.

There are only two fonts used throughout the entire document but it looks like every element, on every page, in the PDFs has these two fonts embedded. So, just say, if there are 10 elements on a page and there are 10,000 pages, that's 20,000 embedded font sets.

The first thing I looked at was the font options used in PDFsharp & MigraDoc. There is an option for font embedding.

var renderer = new PdfDocumentRenderer(true, PdfFontEmbedding.None);

var options = new XPdfFontOptions(PdfFontEmbedding.None);

using (var gfx = XGraphics.FromPdfPage(currentPage))//currentPage is of type PdfPage
{
    gfx.MFEH = PdfFontEmbedding.None;
    ...

Originally these embedding options were set to PdfFontEmbedding.Always, but I changed them to .None hoping that the issue would be resolved. It wasn't. In fact, nothing changed. Still had the same amount of fonts embedded and was the same size.

The printing vendor called me and informed me that he'd taken the PDF, converted it to postscript and then back to PDF and the file size reduced by two-thirds and all the font-embedding was gone.

For what I know about postscript (basically, nothing), I assume that the fonts are no longer embedded because the file is some sort of vector format or something and the text is no longer able to be selected. I guess this isn't an issue for the client or the vendor. They seemed to be happy with the idea of converting the generated PDFs to postscript files and then back to PDF.

So, I have been researching possible ways of doing those conversions in C# but haven't really found much on it. I have seen some things about using Ghostscript or Ghostscript.Net. Documentation on those is pretty lacking and I haven't seen any good examples.

Does anyone know a good way to do these conversions, use PDFsharp and/or MigraDoc to keep fonts from being embedded, or know of another good solution to this issue?

1

1 Answers

1
votes

If you create a new document with PDFsharp or MigraDoc then each font should be embedded only once per PDF file, no matter how many pages there are, no matter how many elements use a font.

If you create 1000 PDF documents with one page each and combine them to one document with 1000 pages, then you will have 1000 copies of the font. No size optimisation is done when merging PDF documents with PDFsharp.
So create one document with all pages in a single run.

You wrote: "So, just say, if there are 10 elements on a page and there are 10,000 pages, that's 20,000 embedded font sets." This should not happen and in my experience it does not happen when creating a document with 10,000 pages in a single run.
PDFsharp can be used to merge PDF files, but then you will get duplicated fonts.

The font embedding options you mention apply to new content that is being added to PDF files. They have no effect on fonts that are already embedded in PDF files that are merged or modified.