How to display PDF text with the correct embedded font

Question

I'm currently writing a PDF reader and I have problems displaying text with embedded fonts.

I thought the font file stream of a Type 1 font would contain postscript with information how each individual glyph is displayed.

I tried to flatedecode the stream but the result was not readable and seemed to be nonsense.

 private static byte[] DecodeFlateDecodeData(byte[] data)
        {
            MemoryStream outputStream;
            using (outputStream = new MemoryStream())
            {
                using (var compressedDataStream = new MemoryStream(data))
                {
                    // Remove the first two bytes to skip the header (it isn't recognized by the DeflateStream class)
                    compressedDataStream.ReadByte();
                    compressedDataStream.ReadByte();

                    var deflateStream = new DeflateStream(compressedDataStream, CompressionMode.Decompress, true);

                    var decompressedBuffer = new byte[1024];
                    int read;
                    while ((read = deflateStream.Read(decompressedBuffer, 0, decompressedBuffer.Length)) != 0)
                    {
                        outputStream.Write(decompressedBuffer, 0, read);
                    }
                    outputStream.Flush();
                    compressedDataStream.Close();
                }
                return outputStream.ToArray();
            }
        }

Source: Extract embedded PDF fonts to an external ttf file using some utility or script

I expected something like this

%!FontType1-1.0: Symbol 001.003
%%CreationDate: Thu Apr 16 1987
%%VMusage: 27647 34029
% Copyright (c) 1985, 1987 Adobe Systems
% Incorporated. All rights reserved.
11 dict begin
/FontInfo 8 dict dup begin
/version (001.003) readonly def
/FullName (Symbol) readonly def
/FamilyName (Symbol) readonly def
/Weight (Medium) readonly def
/ItalicAngle 0 def
/isFixedPitch false def
/UnderlinePosition -98 def
/UnderlineThickness 54 def
end readonly def
/FontName /Symbol def
.
.
.
cleartomark

as seen in the Type 1 reference on page 11.

Or do I fundamentally misunderstand something here?

KenS's original answer which was appropriate as a comment: Without seeing the original stream its rather hard to commnt. Is it possible that the original was a Type1C rather than a Type1 (Subtype) ? It would help a lot if you could post a URL where we could look at the original PDF file. — mkl
This is the file I tried to read adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/… — Christoph Bruns
And it seems that the problem is exactly what I said, its a Type1C font. Which I rather think vindicates my original answer being an answer...... — KenS
@KenS As edited by Bhargav Rao it became an answer, albeit a very short one. Before it was formulated as a comment (as a very appropriate comment, easily giving rise to an answer, but as a comment nonetheless). — mkl
I know it became an answer, but the original answer contained the exact same information, and additionally pointed out that there was insufficient information in the question. In addition the 'answer' now refers to an 'issue ocurring' when there is no real issue described. Basically I'm fed up of people (who can't answer the question themselves and often don't understand the details of the question or answer) deciding that my answers are comments when In my opinion they are answers. This is especially true in the PostScript taggged section of this site. I disagree about its beind a comment. — KenS

KenS KenS · Accepted Answer · 2019-05-09T15:57:34

0

votes

This occurs if the original was a Type1C rather than a Type1 (Subtype).

How to display PDF text with the correct embedded font

1 Answers