1
votes

I need an option from within PHP to Manipulate .docx (Microsoft Office 2007) document.

I need to:

  1. Read the internal text
  2. Convert to .html
  3. To view them inside a browser.
  4. To replace text.

I know I can use Word Automation, creating a COM object of Microsoft Word, but it's too slow, unstable and I have to have it installed on the server.

Is there any library or code that can do it from PHP?

5

5 Answers

2
votes

There is PHPWord for that by the authors of PHPExcel.

1
votes

Docx is just a ZIP file containing multiple XML files and embedded media files like images. Because of this, you can read and edit the document with ease. Just unzip it, open word/document.xml, do reading & writing, and repack the files.

Convet to HTML may be difficult. But you'll find a thumbnail of the first page in docProps/thumbnail.jpeg.

Note that you'll have to familiarize yourself with the XML structure to do any complex edits. There's a summary XML docProps/app.xml which has some metadata for the file so don't forget to update it. Read more from Wikipedia: http://en.wikipedia.org/wiki/Office_Open_XML

0
votes

You may have a look at PHPDocX I believe it does all you are asking for.

  1. You may replace variables in a template or just plain text from a prexisting Word document.
  2. It offers quite a few conversion options.
  3. You can also extract the text.
0
votes

You can work with the internal format directly.

DOCX is just a zip file, and inside that there's word/document.xml containing the actual document.

It's quite trivial to unzip the file, read document.xml, str_replace() what you're looking for, save it and re-zip the directory, and it makes for a lightweight, quick and easy mail merge capability for word documents. This also works for other office formats.

Here's the official docs on the internal structure for more information.

0
votes

There is also a PHP class for merging new content into an existing .docx file. It is available here: http://www.tinybutstrong.com/ . The documentation is pretty good as well as having many examples and it is all free and open source. It does require familiarity with the .docx concepts, though.