11
votes

I need to write a java application which can merge docx files. Any suggestions?

4
By "merge," do you mean some simple sort of concatenation? Or something fancier? Is the difficulty the merge part or the docx (rather than doc) part? - Matt Ball
Merge should give the same result as if we manually open in MS Office first document, press Ctrl+C, then open second document, go to its end and press Ctrl+V. - Roman

4 Answers

25
votes

With POI my solution is:

public static void merge(InputStream src1, InputStream src2, OutputStream dest) throws Exception {
    OPCPackage src1Package = OPCPackage.open(src1);
    OPCPackage src2Package = OPCPackage.open(src2);
    XWPFDocument src1Document = new XWPFDocument(src1Package);        
    CTBody src1Body = src1Document.getDocument().getBody();
    XWPFDocument src2Document = new XWPFDocument(src2Package);
    CTBody src2Body = src2Document.getDocument().getBody();        
    appendBody(src1Body, src2Body);
    src1Document.write(dest);
}

private static void appendBody(CTBody src, CTBody append) throws Exception {
    XmlOptions optionsOuter = new XmlOptions();
    optionsOuter.setSaveOuter();
    String appendString = append.xmlText(optionsOuter);
    String srcString = src.xmlText();
    String prefix = srcString.substring(0,srcString.indexOf(">")+1);
    String mainPart = srcString.substring(srcString.indexOf(">")+1,srcString.lastIndexOf("<"));
    String sufix = srcString.substring( srcString.lastIndexOf("<") );
    String addPart = appendString.substring(appendString.indexOf(">") + 1, appendString.lastIndexOf("<"));
    CTBody makeBody = CTBody.Factory.parse(prefix+mainPart+addPart+sufix);
    src.set(makeBody);
}

With Docx4j my solution is:

public class MergeDocx {
    private static long chunk = 0;
    private static final String CONTENT_TYPE = "application/vnd.openxmlformats-officedocument.wordprocessingml.document";

    public void mergeDocx(InputStream s1, InputStream s2, OutputStream os) throws Exception {
        WordprocessingMLPackage target = WordprocessingMLPackage.load(s1);
        insertDocx(target.getMainDocumentPart(), IOUtils.toByteArray(s2));
        SaveToZipFile saver = new SaveToZipFile(target);
        saver.save(os);
    }

    private static void insertDocx(MainDocumentPart main, byte[] bytes) throws Exception {
            AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(new PartName("/part" + (chunk++) + ".docx"));
            afiPart.setContentType(new ContentType(CONTENT_TYPE));
            afiPart.setBinaryData(bytes);
            Relationship altChunkRel = main.addTargetPart(afiPart);

            CTAltChunk chunk = Context.getWmlObjectFactory().createCTAltChunk();
            chunk.setId(altChunkRel.getId());

            main.addObject(chunk);
    }
}
8
votes

The following Java APIs are available to handle OpenXML MS Word documents with Java:

There was one more, but I don't recall the name anymore.

As to your functional requirement: merging two documents is technically tricky to achieve the result as the enduser would expect. Most API's won't allow that. You'll need to extract the desired information from two documents and then create one new document based on this information yourself.

1
votes

It sure looks like POI can work with docx files. Are you trying to figure out how to merge them?

How to extract plain text from a DOCX file using the new OOXML support in Apache POI 3.5?

1
votes

Aspose API is the best so far for merging word doc or docx files so far but that is not free or open source, if you need a free and open source tools there are couple of API you can choose from, you can find a review on them here,

http://www.esupu.com/open-source-office-document-java-api-review/