3
votes

On Git web site there is a detailed instruction for version controling Microsoft Word .doc files with catdoc.

http://git-scm.com/book/en/Customizing-Git-Git-Attributes

However, I realized that this doesn't work for .docx files. It seems that you need either docx2txt or unoconv instead of catdoc (found here). I decided to go with docx2txt for no reason, but I was stuck at the installation of docx2txt into Mac OS X.

This sort of illustrates the steps. In my understanding, all you need is docx2txt.pl at somewhere sensible. I thought /usr/local/bin/ would do. I copied it there. Then, according to the instruction, I tried the following:

$ cd /usr/local/bin/
$ echo '#!/bin/bash
docx2txt.pl "$1" -' > docx2txt

When I try this:

$ docx2txt

I got

Can't read docx file <>!

so, docx2txt seems to be in the path.

Then I edited .gitattributes in the repository folder (ASCII, LF) to add the following line:

*.docx diff=wordx

Then, I also edited .git/config file in the repository as follows:

[diff "wordx"]
    binary = true
    textconv = docx2txt

Because the repository is already in use, I didn't do git init. I edited a .docx Word file in the repository and then typed git diff in the Terminal. But result was not successful.

Binary files a/foo/foo.docx and b/foo/foo.docx differ

Could anyone have any suggestions?

1
Did you chmod +x on the docx2txt?evading
I didn't, but thanks to your post, I somehow realized I was using a wrong directory /usr/bin/ instead of /usr/local/bin/. This solved the problem in installation, but I couldn't go further. I'll update my question.Kouichi C. Nakamura
The only thing I can see, that I've done different than above is, sudo make to install docx2txt. I'm now diffing .dotx files flawlessly on OSX, thanks! (the binary = true option is not needed, btw, so the .git/config can be set from the commandline like this: git config diff.wordx.textconv docx2txt, where docx2txt is given above and placed in $PATH if it isn't and if it can't see docx2txt.pl, the script woun't run and you will fall back to the usual "binary files a/blah b/blah differs")klang
Cheers. Sounds promising. I'll give it a try later.Kouichi C. Nakamura

1 Answers

3
votes

Thanks to klang, I made it. Now I can diff .docx files in Terminal.app in Mac OS X (10.9). But this one doesn't seamlessly work with SourceTree GUI. Below is basically the same as klang's but with minor corrections.

Download and install the docx2txt converter from http://docx2txt.sourceforge.net/

wget -O doc2txt.tar.gz http://docx2txt.cvs.sourceforge.net/viewvc/docx2txt/?view=tar
tar zxf doc2txt.tar.gz
cd docx2txt/docx2txt/
sudo make

Then make a small wrapper script to make docx2txt output to STDOUT

echo '#!/bin/bash
docx2txt.pl "$1" -' > /usr/local/bin/docx2txt
chmod +x /usr/local/bin/docx2txt

Git attributes for (Word) .docx diffing in your repository

echo "*.docx diff=wordx" >> .gitattributes
git config diff.wordx.textconv docx2txt

Use .git/info/attributes if the setting should not be committed with the project.

Git attributes for (Word) .doc diffing

echo "*.doc diff=word" >> .gitattributes
git config diff.word.textconv strings