7
votes

I have N Word documents (Office 2003) from which I want to make a single Word document by merging all the N documents together in some order. How do I go about doing this in Ruby? Thanks!

It's just the documents that are created in MS Office. I do not use Windows and would prefer non-Windows solutions.

EDIT: Will this be easy if the docs are odt files rather than doc files?

3
@Vijay Dev: To answer your edit, the answer is: maybe. You still have to do the conversion to ODT from DOC, which is one extra step. If you have to then convert them back to DOC, it's yet another step. If you're familiar with OOo and programming against it, it may be easier, but either way it's going to take a little elbow grease.Todd Main
I use JODConverter in some other application. I can use it to do the odt to doc conversion I think.Vijay Dev
@Vijay Dev: does the below answer your question?Todd Main
Hi Otaku, Haven't had the time to check this out. Will let you know soon. Thanks!Vijay Dev
@Otaku: Sorry, but how do I use what is mentioned in that link?Vijay Dev

3 Answers

4
votes

The only non-Windows solution that I know of is Ruby bindings in POI. After that, the code would be really similar to to this .NET code: Merge Word Documents As Pages Of A Single Document Using VB.NET. The key code you'll want is to use Selection.InsertFile for as many doucments as you need in the order you choose.

For ODT document merges, see this thread: http://cpanforum.com/threads/9938

3
votes

There is a whole series of really good articles about word and ruby at http://rubyonwindows.blogspot.com/search/label/word. Word files are really complicated, at least before 2007, so you're better off automating word to do it.

0
votes

Understand, almost any answer to this question will depend on the constraints of the doc files you are using...

That being said, in my mind the first option if you are going to do this would be to convert them to a more easily parsed format - RTF is a great example, and if you can get them into this format the RTF Pocket Guide from O Reilly is a GREAT resource for understanding the structure of the files. To convert the files is pretty simple if you can install abiword on the Linux machine. From a command line, you'd just run:

abiword --to=rtf some_file_name.doc

Of course, in Ruby you'd just wrap these commands.

It's the merging that is more complicated -- it will depend on your files. You'll have to make some programmer decisions about whether you're going to combine the stylesheets in each individual doc, the font tables, etc, etc, etc. The content just sits in the middle of that rtf file, but it's all the semantic and style data that you'll have to make choices about. There is no 'one way' here, simply because it depends on what you want on the other side. Here is wher ethe RTF Pocket Guide is a great help - basically you'll want to use it to understand the structure of your rtf's, and decide what you do and don't want.

Otherwise, if you just want the content with NONE of the semantics, you could always convert them to txt files, then concat them. The command is very similar:

abiword --to=txt some_file_name.doc

This is dead simple, it will just split out the text, and you can concat it and be done with it. But again, you'll lose ALL the formatting of any sort.