4
votes

I'm looking for a .NET library that will allow creation of a Word document. I need to export HTML based content to a Word doc (97-2003 format, not docx).

I know that there are the Microsoft Office Automation libraries and Office interop, but as far as I can tell, they require that you have office actually installed and they do the conversion by opening word itself. But I don't want to have the requirement of having office installed for the conversion to work.

Edit: Converting to RTF may even work, if possible.

7

7 Answers

6
votes

Would it work if I somehow converted the CSS to be embedded in the HTML??

Yes. I use an internal style sheet, as I mentioned.

Document Example:

<html>
<head>
<STYLE type="text/css">
    h1 {text-align:center; font-size:12.0pt; font-family:Arial; font-weight:bold;}

    p {margin:0in; margin-bottom:0pt; font-size: 10.0pt;font-family: Arial;}
    p.Address {text-align:center;font-family:Times; margin-bottom: 10px;}
</style></head>
<body>
<p class="Address">The Street</p>
<h1>Head</h1>
3
votes

I use Aspose for working with Word, makes everything a breeze: http://www.aspose.com/

2
votes

I have found that a document output to HTML but called .doc will open properly formated in Word. I tested with Word 2000 and a file with an internal style sheet.

1
votes

Using Word Automation from ASP.NET is not a good idea (see the MSKB - http://support.microsoft.com/default.aspx?scid=kb;EN-US;q257757#kb2)

If you are not using WinForms your best option IMHO is to generate RTF, which ms word will happily open. (see the link in the already referenced article).

Good Luck!

0
votes

Since the doc format specification is not open, and the interop assemblies are the Microsoft solution, I fear that they are your primary (or even only) option.

They do indeed require office to be installed, and they open Word (although showing a window is optional).

I think Word can open HTML documents; is that an option for you?

0
votes

I tried just opening the html directly in word, which technically works except for one thing... My html doc also contains CSS, and when opening in Word, it completely ignores the CSS so I no longer have any of the formatting. I realize that I wouldn't get everything out of the CSS but I would at least like to still have the specified fonts, font sizes, etc... Any way to get it to read the CSS? Would it work if I somehow converted the CSS to be embedded in the HTML??

0
votes

There's a tool called JODConverter which hooks into open office to expose it's file format converters, there's versions available as a webapp (sits in tomcat) which you post to and a command line tool. I've been firing html at it and converting to .doc and pdf succesfully it's in a fairly big project, haven't gone live yet but I think I'm going to be using it. http://sourceforge.net/projects/jodconverter/