2
votes

Here are two problems that I am facing, among these two problems one should be solved, to make my project work.

So Here are those:

  1. How to read ".doc" file, without using Word automation or any paid SDK like Aspose.Words.

    (If first one is not possible then)

  2. How to convert ".doc" file to ".docx"? without using Word automation or any paid SDK like Aspose.Words.

Searched a lot, I found open source solution for .docx only.

This is to be done on Server so no Word installed there.

5

5 Answers

3
votes

You might want to give this pure .NET solution a shot:

b2xtranslator

It does not require you to install any Office application on the server.

2
votes

Take a look at NPOI - it's written in .NET and is free and open source. The roadmap intends to support creation of the new formats in future, but for now you could use it to read the old format and use other solutions to write the new one, which is an open standard (see the MS spec here).

2
votes

I also faced same problem. If you want to convert .doc to .docx you can use Microsoft.Office.Interop.Word library. It works for me. Here is the code.

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using Word = Microsoft.Office.Interop.Word;
    using System.Reflection;
    using System.IO;


namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {

            Word._Application application = new Word.Application();
            object fileformat = Word.WdSaveFormat.wdFormatXMLDocument;
            DirectoryInfo directory = new DirectoryInfo(@"D:\abc");
            foreach (FileInfo file in directory.GetFiles("*.doc", SearchOption.AllDirectories))
            {
                if (file.Extension.ToLower() == ".doc")
                {
                    object filename = file.FullName;
                    object newfilename = file.FullName.ToLower().Replace(".doc", ".docx");
                    Word._Document document = application.Documents.Open(filename);

                    document.Convert();
                    document.SaveAs(newfilename, fileformat);
                    document.Close();
                    document = null;
                }
            }
            application.Quit();
            application = null;




        }
    }
}

It will work for you also..

1
votes

You can use OpenXML SDK if you want open source. or else there is a option in .NET using Interop.Word API. You can open file using this api and save it as docx.

http://msdn.microsoft.com/de-de/library/microsoft.office.interop.word(v=office.11).aspx

But this needs word to be installed at the machine.

1
votes

There was a Microsoft Bulk Conversion Tool which did this. I've found a reference here.

Otherwise I think you have no choice but to use Word Automation. After all, even OpenOffice has trouble opening some .doc files and converting them to .docx / OpenXML, which implies writing any sort of parsing tool yourself is going to be troublesome.