472
votes

I want to generate a PDF by passing HTML contents to a function. I have made use of iTextSharp for this but it does not perform well when it encounters tables and the layout just gets messy.

Is there a better way?

30
You can use GemBox.Document for this. Also here you can find a sample code for converting HTML file into a PDF file.Mario Z
Which version of iTextSharp do you use and could you share your html?Amedee Van Gasse
Still no answer to my request for additional information. Please also add if you are using HTMLWorker or XMLWorker.Amedee Van Gasse
What about .net core?Piero Alberto
SEPT 2019: I have added a new answer some of the options listed are free others paid and a few are available as .net core stackoverflow.com/questions/564650/…Mauricio Gracia Gutierrez

30 Answers

227
votes

EDIT: New Suggestion HTML Renderer for PDF using PdfSharp

(After trying wkhtmltopdf and suggesting to avoid it)

HtmlRenderer.PdfSharp is a 100% fully C# managed code, easy to use, thread safe and most importantly FREE (New BSD License) solution.

Usage

  1. Download HtmlRenderer.PdfSharp nuget package.
  2. Use Example Method.

    public static Byte[] PdfSharpConvert(String html)
    {
        Byte[] res = null;
        using (MemoryStream ms = new MemoryStream())
        {
            var pdf = TheArtOfDev.HtmlRenderer.PdfSharp.PdfGenerator.GeneratePdf(html, PdfSharp.PageSize.A4);
            pdf.Save(ms);
            res = ms.ToArray();
        }
        return res;
    }
    

A very Good Alternate Is a Free Version of iTextSharp

Until version 4.1.6 iTextSharp was licensed under the LGPL licence and versions until 4.16 (or there may be also forks) are available as packages and can be freely used. Of course someone can use the continued 5+ paid version.

I tried to integrate wkhtmltopdf solutions on my project and had a bunch of hurdles.

I personally would avoid using wkhtmltopdf - based solutions on Hosted Enterprise applications for the following reasons.

  1. First of all wkhtmltopdf is C++ implemented not C#, and you will experience various problems embedding it within your C# code, especially while switching between 32bit and 64bit builds of your project. Had to try several workarounds including conditional project building etc. etc. just to avoid "invalid format exceptions" on different machines.
  2. If you manage your own virtual machine its ok. But if your project is running within a constrained environment like (Azure (Actually is impossible withing azure as mentioned by the TuesPenchin author) , Elastic Beanstalk etc) it's a nightmare to configure that environment only for wkhtmltopdf to work.
  3. wkhtmltopdf is creating files within your server so you have to manage user permissions and grant "write" access to where wkhtmltopdf is running.
  4. Wkhtmltopdf is running as a standalone application, so its not managed by your IIS application pool. So you have to either host it as a service on another machine or you will experience processing spikes and memory consumption within your production server.
  5. It uses temp files to generate the pdf, and in cases Like AWS EC2 which has really slow disk i/o it is a big performance problem.
  6. The most hated "Unable to load DLL 'wkhtmltox.dll'" error reported by many users.

--- PRE Edit Section ---

For anyone who want to generate pdf from html in simpler applications / environments I leave my old post as suggestion.

TuesPechkin

https://www.nuget.org/packages/TuesPechkin/

or Especially For MVC Web Applications (But I think you may use it in any .net application)

Rotativa

https://www.nuget.org/packages/Rotativa/

They both utilize the wkhtmtopdf binary for converting html to pdf. Which uses the webkit engine for rendering the pages so it can also parse css style sheets.

They provide easy to use seamless integration with C#.

Rotativa can also generate directly PDFs from any Razor View.

Additionally for real world web applications they also manage thread safety etc...

194
votes

Update: I would now recommend PupeteerSharp over wkhtmltopdf.

Try wkhtmtopdf. It is the best tool I have found so far.

For .NET, you may use this small library to easily invoke wkhtmtopdf command line utility.

90
votes

Last Updated: October 2020

This is the list of options for HTML to PDF conversion in .NET that I have put together (some free some paid)

If none of the options above help you you can always search the NuGet packages:
https://www.nuget.org/packages?q=html+pdf

41
votes

I recently performed a PoC regarding HTML to PDF conversion and wanted to share my results.

My favorite by far is OpenHtmlToPdf

Advantages of this tool:

  • Very good HTML compatibility (e.g. it was the only tool in my example that correctly repeated table headers when a table spanned multiple pages)
  • Fluent API
  • Free and OpenSource (Creative Commons Attribution 3.0 license)
  • Available via NuGet

Other tools tested:

28
votes

Most HTML to PDF converter relies on IE to do the HTML parsing and rendering. This can break when user updates their IE. Here is one that does not rely on IE.

The code is something like this:

EO.Pdf.HtmlToPdf.ConvertHtml(htmlText, pdfFileName);

Like many other converters, you can pass text, file name, or Url. The result can be saved into a file or a stream.

27
votes

I highly recommend NReco, seriously. It has the free and paid version, and really worth it. It uses wkhtmtopdf in background, but you just need one assembly. Fantastic.

Example of use:

Install via NuGet.

var htmlContent = String.Format("<body>Hello world: {0}</body>", DateTime.Now);
var pdfBytes = (new NReco.PdfGenerator.HtmlToPdfConverter()).GeneratePdf(htmlContent);

Disclaimer: I'm not the developer, just a fan of the project :)

13
votes

Winnovative offer a .Net PDF library that supports HTML input. They offer an unlimited free trial. Depending on how you wish to deploy your project, this might be sufficient.

10
votes

You can use Google Chrome print-to-pdf feature from its headless mode. I found this to be the simplest yet the most robust method.

var url = "https://stackguides.com/questions/564650/convert-html-to-pdf-in-net";
var chromePath = @"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe";
var output = Path.Combine(Environment.CurrentDirectory, "printout.pdf");
using (var p = new Process())
    {
        p.StartInfo.FileName = chromePath;
        p.StartInfo.Arguments = $"--headless --disable-gpu --print-to-pdf={output} {url}";
        p.Start();
        p.WaitForExit();
    }

9
votes

Essential PDF can be used to convert HTML to PDF: C# sample. The sample linked to here is ASP.NET based, but the library can be used from Windows Forms, WPF, ASP.NET Webforms, and ASP.NET MVC. The library offers the option of using different HTML rendering engines : Internet Explorer (default) and WebKit (best output).

The whole suite of controls is available for free (commercial applications also) through the community license program if you qualify. The community license is the full product with no limitations or watermarks.

Note: I work for Syncfusion.

8
votes

If you don't really need a true .Net PDF library, there are numerous free HTML to PDF tools, many of which can run from a command-line.

One solution would be to pick one of those and then write a thin wrapper around that in C#. E.g., as done in this tutorial.

8
votes

There's also a new web-based document generation app - DocRaptor.com. Seems easy to use, and there's a free option.

7
votes

I used ExpertPDF Html To Pdf Converter. Does a decent job. Unfortunatelly, it's not free.

6
votes

2018's update, and Let's use standard HTML+CSS=PDF equation!

There are good news for HTML-to-PDF demands. As this answer showed, the W3C standard css-break-3 will solve the problem... It is a Candidate Recommendation with plan to turn into definitive Recommendation in 2017 or 2018, after tests.

As not-so-standard there are solutions, with plugins for C#, as showed by print-css.rocks.

4
votes

ABCpdf.NET (http://www.websupergoo.com/abcpdf-5.htm)

We use and recommend.

Very good component, it not only convert a webpage to PDF like an image but really convert text, image, formatting, etc...

It's not free but it's cheap.

4
votes

I'm the author of the Rotativa package. It allows to create PDF files directly from razor views:

https://www.nuget.org/packages/Rotativa/

Trivial to use and you have full control on the layout since you can use razor views with data from your Model and ViewBag container.

I developed a SaaS version on Azure. It makes it even easier to use it from WebApi or any .Net app, service, Azure website, Azure webjob, whatever runs .Net.

http://www.rotativahq.com/

Free accounts available.

4
votes

Below is an example of converting html + css to PDF using iTextSharp (iTextSharp + itextsharp.xmlworker)

using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.tool.xml;


byte[] pdf; // result will be here

var cssText = File.ReadAllText(MapPath("~/css/test.css"));
var html = File.ReadAllText(MapPath("~/css/test.html"));

using (var memoryStream = new MemoryStream())
{
        var document = new Document(PageSize.A4, 50, 50, 60, 60);
        var writer = PdfWriter.GetInstance(document, memoryStream);
        document.Open();

        using (var cssMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(cssText)))
        {
            using (var htmlMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
            {
                XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, htmlMemoryStream, cssMemoryStream);
            }
        }

        document.Close();

        pdf = memoryStream.ToArray();
}
4
votes

Quite likely most projects will wrap a C/C++ Engine rather than implementing a C# solution from scratch. Try Project Gotenberg.

To test it

docker run --rm -p 3000:3000 thecodingmachine/gotenberg:6

Curl sample

curl --request POST \
    --url http://localhost:3000/convert/url \
    --header 'Content-Type: multipart/form-data' \
    --form remoteURL=https://brave.com \
    --form marginTop=0 \
    --form marginBottom=0 \
    --form marginLeft=0 \
    --form marginRight=0 \
    -o result.pdf

C# sample.cs

using System;
using System.Net.Http;
using System.Threading.Tasks;
using System.IO;
using static System.Console;

namespace Gotenberg
{
    class Program
    {
        public static async Task Main(string[] args)
        {
            try
            {
                var client = new HttpClient();            
                var formContent = new MultipartFormDataContent
                    {
                        {new StringContent("https://brave.com/"), "remoteURL"},
                        {new StringContent("0"), "marginTop" }
                    };
                var result = await client.PostAsync(new Uri("http://localhost:3000/convert/url"), formContent);
                await File.WriteAllBytesAsync("duckduck.com.pdf", await result.Content.ReadAsByteArrayAsync());
            }
            catch (Exception ex)
            {
                WriteLine(ex);
            }
        }
    }
}

To compile

csc sample.cs -langversion:latest -reference:System.Net.Http.dll && mono ./sample.exe
3
votes

It depends on any other requirements you have.

A really simple but not easily deployable solution is to use a WebBrowser control to load the Html and then using the Print method printing to a locally installed PDF printer. There are several free PDF printers available and the WebBrowser control is a part of the .Net framework.

EDIT: If you Html is XHtml you can use PDFizer to do the job.

3
votes

PDF Vision is good. However, you have to have Full Trust to use it. I already emailed and asked why my HTML wasn't being converted on the server but it worked fine on localhost.

3
votes

It seems like so far the best free .NET solution is the TuesPechkin library which is a wrapper around the wkhtmltopdf native library.

I've now used the single-threaded version to convert a few thousand HTML strings to PDF files and it seems to work great. It's supposed to also work in multi-threaded environments (IIS, for example) but I haven't tested that.

Also since I wanted to use the latest version of wkhtmltopdf (0.12.5 at the time of writing), I downloaded the DLL from the official website, copied it to my project root, set copy to output to true, and initialized the library like so:

var dllDir = AppDomain.CurrentDomain.BaseDirectory;
Converter = new StandardConverter(new PdfToolset(new StaticDeployment(dllDir)));

Above code will look exactly for "wkhtmltox.dll", so don't rename the file. I used the 64-bit version of the DLL.

Make sure you read the instructions for multi-threaded environments, as you will have to initialize it only once per app lifecycle so you'll need to put it in a singleton or something.

2
votes

I was also looking for this a while back. I ran into HTMLDOC http://www.easysw.com/htmldoc/ which is a free open source command line app that takes an HTML file as an argument and spits out a PDF from it. It's worked for me pretty well for my side project, but it all depends on what you actually need.

The company that makes it sells the compiled binaries, but you are free to download and compile from source and use it for free. I managed to compile a pretty recent revision (for version 1.9) and I intend on releasing a binary installer for it in a few days, so if you're interested I can provide a link to it as soon as I post it.

Edit (2/25/2014): Seems like the docs and site moved to http://www.msweet.org/projects.php?Z1

2
votes

You need to use a commercial library if you need perfect html rendering in pdf.

ExpertPdf Html To Pdf Converter is very easy to use and it supports the latest html5/css3. You can either convert an entire url to pdf:

using ExpertPdf.HtmlToPdf; 
byte[] pdfBytes = new PdfConverter().GetPdfBytesFromUrl(url);

or a html string:

using ExpertPdf.HtmlToPdf; 
byte[] pdfBytes = new PdfConverter().GetPdfBytesFromHtmlString(html, baseUrl);

You also have the alternative to directly save the generated pdf document to a Stream of file on the disk.

2
votes

I found the following library more effective in converting html to pdf.
nuget: https://www.nuget.org/packages/Select.HtmlToPdf/

2
votes

This is a free library and works very easily : OpenHtmlToPdf

string timeStampForPdfName = DateTime.Now.ToString("yyMMddHHmmssff");

string serverPath = System.Web.Hosting.HostingEnvironment.MapPath("~/FolderName");
string pdfSavePath = Path.Combine(@serverPath, "FileName" + timeStampForPdfName + ".FileExtension");


//OpenHtmlToPdf Library used for Performing PDF Conversion
var pdf = Pdf.From(HTML_String).Content();

//FOr writing to file from a ByteArray
 File.WriteAllBytes(pdfSavePath, pdf.ToArray()); // Requires System.Linq
2
votes

For all those looking for an working solution in .net 5 here you go.

Here are my working solutions.

Using wkhtmltopdf:

  1. Download and install wkhtmltopdf latest version from here.
  2. Use the below code.
public static string HtmlToPdf(string outputFilenamePrefix, string[] urls,
    string[] options = null,
    string pdfHtmlToPdfExePath = @"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe")
{
    string urlsSeparatedBySpaces = string.Empty;
    try
    {
        //Determine inputs
        if ((urls == null) || (urls.Length == 0))
            throw new Exception("No input URLs provided for HtmlToPdf");
        else
            urlsSeparatedBySpaces = String.Join(" ", urls); //Concatenate URLs

        string outputFilename = outputFilenamePrefix + "_" + DateTime.Now.ToString("yyyy-MM-dd-hh-mm-ss-fff") + ".PDF"; // assemble destination PDF file name

        var p = new System.Diagnostics.Process()
        {
            StartInfo =
            {
                FileName = pdfHtmlToPdfExePath,
                Arguments = ((options == null) ? "" : string.Join(" ", options)) + " " + urlsSeparatedBySpaces + " " + outputFilename,
                UseShellExecute = false, // needs to be false in order to redirect output
                RedirectStandardOutput = true,
                RedirectStandardError = true,
                RedirectStandardInput = true, // redirect all 3, as it should be all 3 or none
                WorkingDirectory = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location))
            }
        };

        p.Start();

        // read the output here...
        var output = p.StandardOutput.ReadToEnd();
        var errorOutput = p.StandardError.ReadToEnd();

        // ...then wait n milliseconds for exit (as after exit, it can't read the output)
        p.WaitForExit(60000);

        // read the exit code, close process
        int returnCode = p.ExitCode;
        p.Close();

        // if 0 or 2, it worked so return path of pdf
        if ((returnCode == 0) || (returnCode == 2))
            return outputFilename;
        else
            throw new Exception(errorOutput);
    }
    catch (Exception exc)
    {
        throw new Exception("Problem generating PDF from HTML, URLs: " + urlsSeparatedBySpaces + ", outputFilename: " + outputFilenamePrefix, exc);
    }
}
  1. And call the above method as HtmlToPdf("test", new string[] { "https://www.google.com" }, new string[] { "-s A5" });
  2. If you need to convert HTML string to PDF, the tweak the above method and replace the Arguments to Process StartInfo as $@"/C echo | set /p=""{htmlText}"" | ""{pdfHtmlToPdfExePath}"" {((options == null) ? "" : string.Join(" ", options))} - ""C:\Users\xxxx\Desktop\{outputFilename}""";

Drawbacks of this approach:

  1. The latest build of wkhtmltopdf as of posting this answer does not support latest HTML5 and CSS3. Hence if you try to export any html that as CSS GRID then the output will not be as expected.
  2. You need to handle concurrency issues.

Using chrome headless:

  1. Download and install latest chrome browser from here.
  2. Use the below code.
var p = new System.Diagnostics.Process()
{
    StartInfo =
    {
        FileName = "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe",
        Arguments = @"/C --headless --disable-gpu --run-all-compositor-stages-before-draw --print-to-pdf-no-header --print-to-pdf=""C:/Users/Abdul Rahman/Desktop/test.pdf"" ""C:/Users/Abdul Rahman/Desktop/grid.html""",
    }
};

p.Start();

// ...then wait n milliseconds for exit (as after exit, it can't read the output)
p.WaitForExit(60000);

// read the exit code, close process
int returnCode = p.ExitCode;
p.Close();
  1. This will convert html file to pdf file.
  2. If you need to convert some url to pdf then use the following as Argument to Process StartInfo

@"/C --headless --disable-gpu --run-all-compositor-stages-before-draw --print-to-pdf-no-header --print-to-pdf=""C:/Users/Abdul Rahman/Desktop/test.pdf"" ""https://www.google.com""",

Drawbacks of this approach:

  1. This works as expected with latest HTML5 and CSS3 features. Output will be same as you view in browser but when running this via IIS you need to run the AppliactionPool of your application under LocalSystem Identity or you need to provide read/write access to IISUSRS.

Using Selenium WebDriver:

  1. Install Nuget Packages Selenium.WebDriver and Selenium.WebDriver.ChromeDriver.
  2. Use the below code.
public async Task<byte[]> ConvertHtmlToPdf(string html)
{
    var directory = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.CommonDocuments), "ApplicationName");
    Directory.CreateDirectory(directory);
    var filePath = Path.Combine(directory, $"{Guid.NewGuid()}.html");
    await File.WriteAllTextAsync(filePath, html);

    var driverOptions = new ChromeOptions();
    // In headless mode, PDF writing is enabled by default (tested with driver major version 85)
    driverOptions.AddArgument("headless");
    using var driver = new ChromeDriver(driverOptions);
    driver.Navigate().GoToUrl(filePath);

    // Output a PDF of the first page in A4 size at 90% scale
    var printOptions = new Dictionary<string, object>
    {
        { "paperWidth", 210 / 25.4 },
        { "paperHeight", 297 / 25.4 },
        { "scale", 0.9 },
        { "pageRanges", "1" }
    };
    var printOutput = driver.ExecuteChromeCommandWithResult("Page.printToPDF", printOptions) as Dictionary<string, object>;
    var pdf = Convert.FromBase64String(printOutput["data"] as string);

    File.Delete(filePath);

    return pdf;
}

Advantage of this method:

  1. This just needs an Nuget installation and works as expected with latest HTML5 and CSS3 features. Output will be same as you view in browser.

With this approach, please make sure to add <PublishChromeDriver>true</PublishChromeDriver> in .csproj file as shown below:

<PropertyGroup>
  <TargetFramework>net5.0</TargetFramework>
  <LangVersion>latest</LangVersion>
  <Nullable>enable</Nullable>
  <PublishChromeDriver>true</PublishChromeDriver>
</PropertyGroup>

This will publish the chrome driver when publishing the project.

Here is the link to my working project repo - HtmlToPdf

I arrived at the above answer after almost spending 2 days with available options and finally implemented Selenium based solution and its working. Hope this helps you and save your time.

1
votes

Here is a wrapper for wkhtmltopdf.dll by pruiz

And a wrapper for wkhtmltopdf.exe by Codaxy
- also on nuget.

1
votes

Best Tool i have found and used for generating PDF of javascript and styles rendered views or html pages is phantomJS.

Download the .exe file with the rasterize.js function found in root of exe of example folder and put inside solution.

It Even allows you to download the file in any code without opening that file also it also allows to download the file when the styles and specially jquery are applied.

Following code generate PDF File :

public ActionResult DownloadHighChartHtml()
{
    string serverPath = Server.MapPath("~/phantomjs/");
    string filename = DateTime.Now.ToString("ddMMyyyy_hhmmss") + ".pdf";
    string Url = "http://wwwabc.com";

    new Thread(new ParameterizedThreadStart(x =>
    {
        ExecuteCommand(string.Format("cd {0} & E: & phantomjs rasterize.js {1} {2} \"A4\"", serverPath, Url, filename));
                           //E: is the drive for server.mappath
    })).Start();

    var filePath = Path.Combine(Server.MapPath("~/phantomjs/"), filename);

    var stream = new MemoryStream();
    byte[] bytes = DoWhile(filePath);

    Response.ContentType = "application/pdf";
    Response.AddHeader("content-disposition", "attachment;filename=Image.pdf");
    Response.OutputStream.Write(bytes, 0, bytes.Length);
    Response.End();
    return RedirectToAction("HighChart");
}



private void ExecuteCommand(string Command)
{
    try
    {
        ProcessStartInfo ProcessInfo;
        Process Process;

        ProcessInfo = new ProcessStartInfo("cmd.exe", "/K " + Command);

        ProcessInfo.CreateNoWindow = true;
        ProcessInfo.UseShellExecute = false;

        Process = Process.Start(ProcessInfo);
    }
    catch { }
}


private byte[] DoWhile(string filePath)
{
    byte[] bytes = new byte[0];
    bool fail = true;

    while (fail)
    {
        try
        {
            using (FileStream file = new FileStream(filePath, FileMode.Open, FileAccess.Read))
            {
                bytes = new byte[file.Length];
                file.Read(bytes, 0, (int)file.Length);
            }

            fail = false;
        }
        catch
        {
            Thread.Sleep(1000);
        }
    }

    System.IO.File.Delete(filePath);
    return bytes;
}
1
votes

As a representative of HiQPdf Software I believe the best solution is HiQPdf HTML to PDF converter for .NET. It contains the most advanced HTML5, CSS3, SVG and JavaScript rendering engine on market. There is also a free version of the HTML to PDF library which you can use to produce for free up to 3 PDF pages. The minimal C# code to produce a PDF as a byte[] from a HTML page is:

HtmlToPdf htmlToPdfConverter = new HtmlToPdf();

// set PDF page size, orientation and margins
htmlToPdfConverter.Document.PageSize = PdfPageSize.A4;
htmlToPdfConverter.Document.PageOrientation = PdfPageOrientation.Portrait;
htmlToPdfConverter.Document.Margins = new PdfMargins(0);

// convert HTML to PDF 
byte[] pdfBuffer = htmlToPdfConverter.ConvertUrlToMemory(url);

You can find more detailed examples both for ASP.NET and MVC in HiQPdf HTML to PDF Converter examples repository.

1
votes

You can also check Spire, it allow you to create HTML to PDF with this simple piece of code

 string htmlCode = "<p>This is a p tag</p>";
 
//use single thread to generate the pdf from above html code
Thread thread = new Thread(() =>
{ pdf.LoadFromHTML(htmlCode, false, setting, htmlLayoutFormat); });
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();
 
// Save the file to PDF and preview it.
pdf.SaveToFile("output.pdf");
System.Diagnostics.Process.Start("output.pdf");
0
votes

Try this PDF Duo .Net converting component for converting HTML to PDF from ASP.NET application without using additional dlls.

You can pass the HTML string or file, or stream to generate the PDF. Use the code below (Example C#):

string file_html = @"K:\hdoc.html";   
string file_pdf = @"K:\new.pdf";   
try   
{   
    DuoDimension.HtmlToPdf conv = new DuoDimension.HtmlToPdf();   
    conv.OpenHTML(file_html);   
    conv.SavePDF(file_pdf);   
    textBox4.Text = "C# Example: Converting succeeded";   
}   

Info + C#/VB examples you can find at: http://www.duodimension.com/html_pdf_asp.net/component_html_pdf.aspx