I am writing a custom Lucene.NET indexer to enable indexing of MS Word documents. The indexer must be capable of handling last three releases of MS Word: 2010, 2007 and 2003.
The plan is to use VSTO interop assemblies that are installed as part of VS2010 to extract text content from the documents.
Is there a better way to implement Word document indexing? Does this mean I will have to install all three versions of Word on the server? Or just Word 2010?
Tools/Environment:
- Lucene.NET 2.3.1.3
- VS2010 / .NET 3.5
- Windows 2008 / IIS 7
Note: For details on how to implement this, see Sitecore text search in PDF or Word documents