18
votes

I understand iTextSharp can be used for converting a document to pdf.

But first we have to create a document from scratch using iTextSharp.text.Document and then adding elements to this document.

What if I have an existing doc file, is it possible to convert this document to pdf using iTextSharp.

Also, I want to use iTextSharp or any similar tool which can perform following on a doc file:

  1. manipulation of doc/docx/text files (like replacing some placeholders with DB values) as well as
  2. converts them to .pdf

Anyone having idea about this, please share.

Thank you!

6
For maximum flexibility, you might consider separate "best-of-breed" solutions for each of the manipulation and conversion steps. That's the beauty of standard file formats (doc, docx).JasonPlutext

6 Answers

14
votes

The Aspose.Words component can do this reliably (I'm not affiliated or anything).

iTextSharp does not have the required feature set to load and process MS Word file formats.

3
votes

Aspose.Words is indeed a good solution, but it doesn't offer perfect fidelity. At the time of writing it has problems with non Roman languages, complex formatting such as floating elements and a number of other problems.

You may want to have a look at this PDF Conversion Web Service that can be used from any Web Services capable environment including Java and .NET.

Note that I worked on this project so the usual disclaimers apply.

3
votes

You can use existing method of Microsoft.Office

 private Microsoft.Office.Interop.Word.ApplicationClass MSdoc;

    //Use for the parameter whose type are not known or say Missing
    object Unknown = Type.Missing;

  private void word2PDF(object Source, object Target)
    {   //Creating the instance of Word Application
      if (MSdoc == null)MSdoc = new Microsoft.Office.Interop.Word.ApplicationClass();

        try
        {
            MSdoc.Visible = false;
            MSdoc.Documents.Open(ref Source, ref Unknown,
                 ref Unknown, ref Unknown, ref Unknown,
                 ref Unknown, ref Unknown, ref Unknown,
                 ref Unknown, ref Unknown, ref Unknown,
                 ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown);
             MSdoc.Application.Visible = false;
              MSdoc.WindowState =   Microsoft.Office.Interop.Word.WdWindowState.wdWindowStateMinimize;

            object format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF;

            MSdoc.ActiveDocument.SaveAs(ref Target, ref format,
                    ref Unknown, ref Unknown, ref Unknown,
                    ref Unknown, ref Unknown, ref Unknown,
                    ref Unknown, ref Unknown, ref Unknown,
                    ref Unknown, ref Unknown, ref Unknown,
                   ref Unknown, ref Unknown);
          }
           catch (Exception e)
          {
            MessageBox.Show(e.Message);
           }
         finally
          {
            if (MSdoc != null)
            {
                MSdoc.Documents.Close(ref Unknown, ref Unknown, ref Unknown);
                //WordDoc.Application.Quit(ref Unknown, ref Unknown, ref Unknown);
            }
            // for closing the application
            WordDoc.Quit(ref Unknown, ref Unknown, ref Unknown);
        }
    } 
1
votes

If you do not care about whether the formatting will be faithful to what Word would display, there is the impressive docx2tex which converts Word 2007 docx files to Latex documents. Once in Latex, you have a lot of power to programmitically reformat the document, and generate PDF from it.

I say more about the utility in an answer at tex.stackexchange.  

1
votes

I do have the same issue.
After several days of trying to find a solution, it seems Docx4J , a Java-based tool, or PDF printers like PDFCreator, could be among the free solution.
For sure, just a commercial tool can effectively do the task requested.
On the Microsoft side, you could use server-side enabled Sharepoint Word Automation Services, ( check on 7 June 2016 ), or interop in your local computer.
The suggested part-to-part conversion ( DOC or DOC to some intermediate language and then to PDF ) it seems for, what users had said on stackoverflow or others forums, not possible, because result is not what expected.

0
votes

For docx manipulation, you should use native Open XML method. Download Open XML SDK 2 from Microsoft.

And then you can convert docx files to pdf with this paid library: http://www.subsystems.com/dpw.htm . It's really great.