1
votes

I am using the wkhtmltopdf.exe to convert HTML to PDF, using the below source code. The problem is - the PDF shows "?" in place of all non-English characters like Chinese, Japanese, Russian, Arabic. When output as HTML, the characters are shown correctly. I tried setting different encoding to the HTML (utf-8, utf-16, gb2312), but the PDF doesn't render non-English languages.

I read in wkhtmltopdf forums about installing Chinese fonts on the server, but looks like they are not for Windows server environment. Moreover, the fonts seems to be available on the server, since HTML renders correctly?

Any ideas to make it work?

Code:

private void WritePDF(string html)
    {
        string inFileName,
                outFileName,
                tempPath;
        Process p;
        System.IO.StreamWriter stdin;
        ProcessStartInfo psi = new ProcessStartInfo();


        tempPath = Request.PhysicalApplicationPath 
            + ConfigurationManager.AppSettings[Constants.AppSettings.ExportToPdfTempFolder];
        inFileName = Session.SessionID + ".htm";
        outFileName = Session.SessionID + ".pdf";

        // run the conversion utility
        psi.UseShellExecute = false;
        psi.FileName = Server.MapPath(ConfigurationManager.AppSettings[Constants.AppSettings.ExportToPdfExecutablePath]);
        psi.CreateNoWindow = true;
        psi.RedirectStandardInput = true;
        psi.RedirectStandardOutput = true;
        psi.RedirectStandardError = true;
        //psi.StandardOutputEncoding = System.Text.Encoding.gb;

        // note that we tell wkhtmltopdf to be quiet and not run scripts
        // NOTE: I couldn't figure out a way to get both stdin and stdout redirected so we have to write to a file and then clean up afterwards
        psi.Arguments = "-q -n - " + tempPath + outFileName;

        p = Process.Start(psi);

        try
        {
            stdin = p.StandardInput;
            stdin.AutoFlush = true;

            stdin.Write(html);
            stdin.Close();

            if (p.WaitForExit(15000))
            {
                // NOTE: the application hangs when we use WriteFile (due to the Delete below?); this works
                Response.BinaryWrite(System.IO.File.ReadAllBytes(tempPath + outFileName));
            }
        }
        finally
        {
            p.Close();
            p.Dispose();
        }

        // delete the pdf
        System.IO.File.Delete(tempPath + outFileName);
    }
2
Did you manage to solve this issue? Any progress reports? I recently have converted my app from disk access to direct streams and it still works fine. So, is this still an issue?Joel Peltonen

2 Answers

5
votes

Wkhtmltopdf definitely can render non-English characters like Chinese, Japanese, Russian, Arabic. In most cases they are not displayed because HTML template misses meta tag with appropriate charset definition. By default .NET uses UTF-8 encoding and in this case HTML template should contain the following meta tag:

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

By the way, instead of calling wkhtmltopdf directly you may use one of the .NET wrappers like NReco PdfGenerator (I'm an author of this library).

0
votes

Make sure your font supports the characters and your source is UTF-8 and it should work - I have tested wkhtmltopdf using korean, chinese, polish and various other characters as well and it has always worked. See my answer on the other similar question https://stackoverflow.com/a/11862584/694325

I write my html sources like but otherwise my PDF generation is VERY similar to yours. I'd check that everything everywhere is utf-8.

using (TextWriter tw = new StreamWriter(path, false, System.Text.Encoding.UTF8))
{
    tw.WriteLine(contents);
}

PDFs generated from source like this seem to work without problems.