Extract embedded PDF fonts to an external ttf file using some utility or script

Question

Is it possible to extract fonts that are embedded in a PDF file to an external ttf file using some utility or script?

If the fonts that are embedded (or not embedded) to a PDF file are present in system. Using pdf2swf and swfextract tools from swftools I am able to determine names of the fonts used in a PDF file. Then I can compile respective system font(s) at run-time and then load to my AIR application.
BUT if the fonts used in the PDF are absent in the system there are two possibilities:

2.1. If they are absent in the PDF files as well (not embedded), we can only use similar system font basing on the font name.

2.2. If they are embedded in the PDF file, then I want to know is it possible at all to extract them to external ttf file so that I can compile each of them to separate swf files at run-time?

willvv willvv · Accepted Answer · 2010-02-09T06:32:12

I know it's been a while since you asked this, but I figured I might be able to help.

I don't know if there is any utility that will allow you to extract the Font files, but you can do it manually.

Basically a PDF file is a text file with different objects. You can open it with any text editor and look for the fonts.

The fonts are specified in FontDescriptor objects, e.g:

<</Type/FontDescriptor/FontName/ABCDEE+Algerian ... /FontFile2 24 0 R>>

This basically says, a font with the name Algerian is specified on the object 24. You can search the document for the object 24 with the line "24 0 obj", after this line, it displays the properties of the stream with the font file and after the "stream" keyword it starts (its length is defined in the line after the obj).

This stream contains the ttf file, compressed, to decompress it you can use this method:

  private static byte[] DecodeFlateDecodeData(byte[] data)
  {
     MemoryStream outputStream;
     using (outputStream = new MemoryStream())
     {
        using (var compressedDataStream = new MemoryStream(data))
        {
           // Remove the first two bytes to skip the header (it isn't recognized by the DeflateStream class)
           compressedDataStream.ReadByte();
           compressedDataStream.ReadByte();

           var deflateStream = new DeflateStream(compressedDataStream, CompressionMode.Decompress, true);

           var decompressedBuffer = new byte[1024];
           int read;
           while ((read = deflateStream.Read(decompressedBuffer, 0, decompressedBuffer.Length)) != 0)
           {
              outputStream.Write(decompressedBuffer, 0, read);
           }
           outputStream.Flush();
           compressedDataStream.Close();
        }
        return GetStreamBytes(outputStream);
     }
  }

I hope this helps you... or helps somebody else

Extract embedded PDF fonts to an external ttf file using some utility or script

5 Answers