2
votes

Does anyone know how to merge (concatenate) docx documents with PHP (or Python if it's not possible in PHP)?

To clarify, my server is Linux based. I have 2 existing docx document, I need to put them in a new docx document using PHP or possibly Python.

3
Since .docx is theoretically XML, you should be able to parse it and construct a new XML document from it. I'm not involved enough in the Microsoft universe to give you more details though. - deceze
@deceze, .docx is actually a zip file with XML and a ton of other resources (images and what not). Merging them is a bit more tricky that it seems, unfortunately. - Brad
@Yongke, is there any way you can call MS Word over COM on a Windows server? You will get a lot farther this way, using Word to do the actual work. Perhaps you can fire up a virtual instance? - Brad
@Brad Good to know. Why am I not surprised that "Office Open XML" is not easy to work with after all? :o) - deceze
@deceze, 'eh, I'll take it over original doc format. You just have to keep in mind all that is supported. How do you handle different paper sizes? Placement of images? Formatting? etc. Much easier to let Word do the work. Even if you could automate Open Office to do this, the compatibility isn't anywhere near 100%, except for basic documents. - Brad

3 Answers

7
votes

Merging two different Docx files may be very complicated because Headers, Styles, Charts, Comments, User Modification Traces and other special contents are saved in separate inner XML sub-files in each Docx. Thus, two Docx may have different objects having the same ids. So it would be a very huge job to list all possible objects in the two documents, give them new inner ids, and re-affect them in a single one. Probably only Ms Office can do this currently.

Nevertheless, if you know that your two documents to be merged have the same styles, and if you know you have no charts, headers and other special objects, then the merging becomes something quite easy to perform.

In this case, you only have to use a Zip reader, such as TbsZip, to open the first Docx file (which is technically a zip archive containing XML sub-files) ; then read the sub-file "word/document.xml" and extract the part which is between the tags < w:body > and < /w:body >. In the second Docx file, open the "word/content.xml" and insert the previous content just before the tag < /w:body >. Save the result in a new Docx file.

This can be done using TbsZip, like this :

<?php

include_once('tbszip.php');

$zip = new clsTbsZip();

// Open the first document
$zip->Open('doc1.docx');
$content1 = $zip->FileRead('word/document.xml');
$zip->Close();

// Extract the content of the first document
$p = strpos($content1, '<w:body');
if ($p===false) exit("Tag <w:body> not found in document 1.");
$p = strpos($content1, '>', $p);
$content1 = substr($content1, $p+1);
$p = strpos($content1, '</w:body>');
if ($p===false) exit("Tag </w:body> not found in document 1.");
$content1 = substr($content1, 0, $p);

// Insert into the second document
$zip->Open('doc2.docx');
$content2 = $zip->FileRead('word/document.xml');
$p = strpos($content2, '</w:body>');
if ($p===false) exit("Tag </w:body> not found in document 2.");
$content2 = substr_replace($content2, $content1, $p, 0);
$zip->FileReplace('word/document.xml', $content2, TBSZIP_STRING);

// Save the merge into a third file
$zip->Flush(TBSZIP_FILE, 'merge.docx');
0
votes

You may merge two Word documents with PHPDocX with a single line of code: (Source: Merging Word documents with PHPDocX)

require_once 'path /classes/DocxUtilities.inc';
$newDocx = new DocxUtilities();
$myOptions = array('mergeType' => 0);
$newDocx->mergeDocx('firstWordDoc.docx', 'secondWordDoc.docx', 'mergedWord.docx',
                    $myOptions);  

This merging let you preserve all section structure (paper size, margins, associated footers and headers,...), includes all the required styles, manages all lists (this may seem trivial but it is not so in the OOXML standard), preserves images and charts as well as footnotes, endnotes and comments.

Moreover there is an option to preserve the original numberings (by default the page numbering continues).

One also may, via the mergeType option, to discard the section structure of the merged document and add it at the end of the first document as part of its last section. In this case, of course, the headers and footers are not imported but all other elements are still preserved.

0
votes

Aspose.Words Cloud SDK for PHP can merge/join several Word Documents into a one Word document while keeping the formatting of appended or destination document depending upon the ImportFormatMode parameter value. Secondly, it is a commercial API but the free pricing plan allows 150 free monthly API Calls.

<?php

require_once('D:\xampp\htdocs\aspose-words-cloud-php-master\vendor\autoload.php');

//TODO: Get your ClientId and ClientSecret at https://dashboard.aspose.cloud (free registration is required).

$ClientSecret="xxxxxxxxxxxxxxxxxxxxxxxxxxxx";
$ClientId="xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx";

$wordsApi = new Aspose\Words\WordsApi($ClientId,$ClientSecret);


try {

    $remoteDataFolder = "Temp";
    $localFile = "C:/Temp/02_pages_adobe.docx";
    $remoteFileName = "02_pages_adobe.docx";
    $localFile1 = "C:/Temp/Sections.docx";
    $remoteFileName1 = "Sections.docx";
    $outputFileName = "TestAppendDocument.docx";

        
    $uploadRequest = new Aspose\Words\Model\Requests\UploadFileRequest($localFile,$remoteDataFolder."/".$remoteFileName,null);
    $wordsApi->uploadFile($uploadRequest);
    $uploadRequest1 = new Aspose\Words\Model\Requests\UploadFileRequest($localFile1,$remoteDataFolder."/".$remoteFileName1,null);
    $wordsApi->uploadFile($uploadRequest1);

    $requestDocumentListDocumentEntries0 = new Aspose\Words\Model\DocumentEntry(array(
            "href" => $remoteDataFolder . "/" . $remoteFileName1,
            "import_format_mode" => "KeepSourceFormatting",
        ));
    $requestDocumentListDocumentEntries = [
            $requestDocumentListDocumentEntries0,
        ];
    $requestDocumentList = new Aspose\Words\Model\DocumentEntryList(array(
            "document_entries" => $requestDocumentListDocumentEntries,
        ));
    $request = new Aspose\Words\Model\Requests\AppendDocumentRequest(
            $remoteFileName,
            $requestDocumentList,
            $remoteDataFolder,
            NULL,
            NULL,
            NULL,
            $remoteDataFolder . "/" . $outputFileName,
            NULL,
            NULL
        );

    $result = $wordsApi->appendDocument($request);
        
    ##Download file 
    $request = new Aspose\Words\Model\Requests\DownloadFileRequest($remoteDataFolder."/".$outputFileName,NULL,NULL);
    $result = $wordsApi->downloadFile($request);
    copy($result->getPathName(),"AppendOutput.docx");

    
} catch (Exception $e) {
    echo  "Something went wrong: ",  $e->getMessage(), "\n";
    PHP_EOL;
}

?>

P.S: I'm developer evangelist at Aspose.