38
votes

I am looking for a way to convert Word and Excel files to PDF using PHP.

The reason for this, is I need to be able to combine files of various formats into one document. I know that if I am able to convert everything to PDF I can then merge the PDFs into one file using PDFMerger (which uses fpdf).

I am already able to create PDFs from other file types / images, but am stuck with Word Docs. (I think I would possibly be able to convert the Excel files using the PHPExcel library that I already use to create Excel files from html code).

I do not use the Zend Framework, so am hoping that someone will be able to point me in the right direction.

Alternatively, if there is a way to create image (jpg) files from the Word documents, that would be workable.

Thanks for any help!

12
Try Google Documents API code.google.com/apis/documentsfabrik
Of course you'll need to upload files to Google Cloud, to an existing Google Account.fabrik
Sadly, because of the data that is being stored, the files have to remain stored securely on the server and can not be transferred to google's servers.saulposel
The Google Documents API 3.0 is now deprecated. They have moved to the Google Drive APIshasi kanth
Did you ever find a solution?Mawg says reinstate Monica

12 Answers

25
votes

I found a solution to my issue and after a request, will post it here to help others. Apologies if I missed any details, it's been a while since I worked on this solution.

The first thing that is required is to install Openoffice.org on the server. I requested my hosting provider to install the open office RPM on my VPS. This can be done through WHM directly.

Now that the server has the capability to handle MS Office files you are able to convert the files by executing command line instructions via PHP. To handle this, I found PyODConverter: https://github.com/mirkonasato/pyodconverter

I created a directory on the server and placed the PyODConverter python file within it. I also created a plain text file above the web root (I named it "adocpdf"), with the following command line instructions in it:

directory=$1
filename=$2
extension=$3
SERVICE='soffice'
if [ "`ps ax|grep -v grep|grep -c $SERVICE`" -lt 1 ]; then 
unset DISPLAY
/usr/bin/soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard & 
sleep 5s
fi
python /home/website/python/DocumentConverter.py /home/website/$directory$filename$extension /home/website/$directory$filename.pdf

This checks that the openoffice.org libraries are running and then calls the PyODConverter script to process the file and output it as a PDF. The 3 variables on the first three lines are provided when the script is executed from with a PHP file. The delay ("sleep 5s") is used to ensure that openoffice.org has enough to time to initiate if required. I have used this for months now and the 5s gap seems to give enough breathing room.

The script will create a PDF version of the document in the same directory as the original.

Finally, initiating the conversion of a Word / Excel file from within PHP (I have it within a function that checks if the file we are dealing with is a word / excel document)...

//use openoffice.org
$output = array();
$return_var = 0;
exec("/opt/adocpdf {$directory} {$filename} {$extension}", $output, $return_var);

This PHP function is called once the Word / Excel file has been uploaded to the server. The 3 variables in the exec() call relate directly to the 3 at the start of the plain text script above. Note that the $directory variable requires no leading forward slash if the file for conversion is within the web root.

OK, that's it! Hopefully this will be useful to someone and save them the difficulties and learning curve I faced.

18
votes

Well my 2 cents when it comes to the topic word 2007 docx, word 97-2004 doc, pdf and all other types of MS Office wishing to be "converted from y to z but in real they don't wanna be". In my experience so far, conversion with LibreOffice or OpenOffice can't be relied on. Though .doc documents tend to be better supported than word 2007's .docx. In general it's very hard to convert the .docx to .doc without breaking anything.

.docx also tend to be extremely useful for templating where .doc is not for being binary.

The conversion from .doc to PDF was most of the time quite reliable. If you can still influence the design or content of the word document then this might be satisfying, but in my situation documents were supplied from foreign companies where even after generating the .docx templates, in some scenario's, the generated .docx had to be slightly modified with supplement text before it was generated to a PDF.


WINDOWS BASED!

All this hiccup made me come to the conclusion that the only true reliable conversion method I found was using the COM class in PHP and let the MS Word or Excel Application do all the work for you. I'll just give an example on converting .docx to .doc and/or PDF. If you do not have MS Office installed, you can download a trial version of 60 days which would give you enough room for testing purposes.

the COM.net extension is by default commented out in the php.ini, just search for the line php_com_dotnet.dll and uncomment it like so

  extension=php_com_dotnet.dll

Restart the web server (IIS is not a pre, Apache will work just as well).

The code below is a demonstration on how easy it is.

  $word = new COM("Word.Application") or die ("Could not initialise Object.");
  // set it to 1 to see the MS Word window (the actual opening of the document)
  $word->Visible = 0;
  // recommend to set to 0, disables alerts like "Do you want MS Word to be the default .. etc"
  $word->DisplayAlerts = 0;
  // open the word 2007-2013 document 
  $word->Documents->Open('yourdocument.docx');
  // save it as word 2003
  $word->ActiveDocument->SaveAs('newdocument.doc');
  // convert word 2007-2013 to PDF
  $word->ActiveDocument->ExportAsFixedFormat('yourdocument.pdf', 17, false, 0, 0, 0, 0, 7, true, true, 2, true, true, false);
  // quit the Word process
  $word->Quit(false);
  // clean up
  unset($word);

This is just a small demonstration. I can just say that if it comes to conversion, this was the only real reliable option I could use and even recommend.

11
votes

1) I am using WAMP.

2) I have installed Open Office (from apache http://www.openoffice.org/download/).

3) $output_dir = "C:/wamp/www/projectfolder/"; this is my project folder where i want to create output file.

4) I have already placed my input file here C:/wamp/www/projectfolder/wordfile.docx";

Then I Run My Code.. (given below)

<?php
    set_time_limit(0);
    function MakePropertyValue($name,$value,$osm){
    $oStruct = $osm->Bridge_GetStruct("com.sun.star.beans.PropertyValue");
    $oStruct->Name = $name;
    $oStruct->Value = $value;
    return $oStruct;
    }
    function word2pdf($doc_url, $output_url){

    //Invoke the OpenOffice.org service manager
    $osm = new COM("com.sun.star.ServiceManager") or die ("Please be sure that OpenOffice.org is installed.\n");
    //Set the application to remain hidden to avoid flashing the document onscreen
    $args = array(MakePropertyValue("Hidden",true,$osm));
    //Launch the desktop
    $oDesktop = $osm->createInstance("com.sun.star.frame.Desktop");
    //Load the .doc file, and pass in the "Hidden" property from above
    $oWriterDoc = $oDesktop->loadComponentFromURL($doc_url,"_blank", 0, $args);
    //Set up the arguments for the PDF output
    $export_args = array(MakePropertyValue("FilterName","writer_pdf_Export",$osm));
    //print_r($export_args);
    //Write out the PDF
    $oWriterDoc->storeToURL($output_url,$export_args);
    $oWriterDoc->close(true);
    }

    $output_dir = "C:/wamp/www/projectfolder/";
    $doc_file = "C:/wamp/www/projectfolder/wordfile.docx";
    $pdf_file = "outputfile_name.pdf";

    $output_file = $output_dir . $pdf_file;
    $doc_file = "file:///" . $doc_file;
    $output_file = "file:///" . $output_file;
    word2pdf($doc_file,$output_file);
    ?>
10
votes

I successfully put a portable version of libreoffice on my host's webserver, which I call with PHP to do a commandline conversion from .docx, etc. to pdf. on the fly. I do not have admin rights on my host's webserver. Here is my blog post of what I did:

http://geekswithblogs.net/robertphyatt/archive/2011/11/19/converting-.docx-to-pdf-or-.doc-to-pdf-or-.doc.aspx

Yay! Convert directly from .docx or .odt to .pdf using PHP with LibreOffice (OpenOffice's successor)!

4
votes

Open Office / LibreOffice based solutions will do an OK job, but don't expect your PDFs to resemble your source files if they were created in MS-Office. A PDF that looks 90% like the original is not considered to be acceptable in many fields.

The only way to make sure your PDFs look exactly like the originals is to use a solution that uses the official MS-Office DLLs under the hood. If you are running your PHP solution on non-Windows based servers then it requires an additional Windows Server. This may be a showstopper, but if you really care about the look and feel of your PDFs you may not have an option.

Have a look at this blog post. It shows how to use PHP to convert MS-Office files with a high level of fidelity.

Disclaimer: I wrote this blog post and worked on a related commercial product, so consider me biased. However, it appears to be a great solution for the PHP people I work with.

3
votes

Step 1. Install "Apache_OpenOffice_4.1.2" in your system Step 2. Download "unoconv" library from github or any where else.

-> C:\Program Files (x86)\OpenOffice 4\program\python.exe = Path of open office install directory

-> D:\wamp\www\doc_to_pdf\libobasis4.4-pyuno\unoconv = Path of library folder

-> D:/wamp/www/doc_to_pdf/files/'.$pdf_File_name.' = path and file name of pdf

-> D:/wamp/www/doc_to_pdf/files/'.$doc_file_name = Path of your document file.

If pdf not created than last step is Go to ->Control Panel\All Control Panel Items\Administrative Tools-> services-> find "wampapache" -> right click and click on property -> click on logon tab Than check checkbox of allow service to interact with desktop

Create sample .php file and put below code and run on wamp or xampp server

$result = exec('"C:\Program Files (x86)\OpenOffice 4\program\python.exe" D:\wamp\www\doc_to_pdf\libobasis4.4-pyuno\unoconv -f pdf -o D:/wamp/www/doc_to_pdf/files/'.$pdf_File_name.' D:/wamp/www/doc_to_pdf/files/'.$doc_file_name);

This code working for me in windows-8 operating system

1
votes

I have found some solution after so much googling. You can also try it if tired to search for a good solution.

For common using SOAP API

You need username and password to make SOAP request on https://www.livedocx.com

Make registration using this https://www.livedocx.com/user/account_registration.aspx and follow the steps accordingly.

Use below code in your .php file.

ini_set ('soap.wsdl_cache_enabled', 0);

// you will get this username and pass while register
define ('USERNAME', 'Username'); 
define ('PASSWORD', 'Password');

// SOAP WSDL endpoint
define ('ENDPOINT', 'https://api.livedocx.com/2.1/mailmerge.asmx?wsdl');
 
// Define timezone
date_default_timezone_set('Europe/Berlin');
$soap = new SoapClient(ENDPOINT);
$soap->LogIn(
    array(
        'username' => USERNAME,
        'password' => PASSWORD
    )
);
$data = file_get_contents('test.doc');
$soap->SetLocalTemplate(
    array(
        'template' => base64_encode($data),
        'format'   => 'doc'
    )
);
$soap->CreateDocument();
$result = $soap->RetrieveDocument(
    array(
        'format' => 'pdf'
    )
);
$data = $result->RetrieveDocumentResult;
file_put_contents('tree.pdf', base64_decode($data));
$soap->LogOut();
unset($soap);

Follow this link for more information http://www.phplivedocx.org/

For Ubuntu

OpenOffice and Unoconv installation Required.

from command prompt

apt-get remove --purge unoconv
git clone https://github.com/dagwieers/unoconv
cd unoconv
sudo make install

Now add below code in your PHP script and make sure file should be executable.

shell_exec('/usr/bin/unoconv -f pdf  folder/test.docx');
shell_exec('/usr/bin/unoconv -f pdf  folder/sachin.png');

Hope this solution help you.

0
votes

Have you tried http://www.phpdocx.com/? Plus, it can be hosted on your server too.

0
votes

For a PHP-specific you could try PHPWord - this library is written in pure PHP and provides a set of classes to write to and read from different document file formats (including .doc and .docx). The main drawback is that the quality of converted files can be quite variable.

Alternatively if you want a higher quality option you could use a file conversion API like Zamzar. You can use it to convert a wide range of office formats (and others) into PDF, and you can call from any platform (Windows, Linux, OS X etc).

PHP code to convert a file would look like this:

<?php
$endpoint = "https://api.zamzar.com/v1/jobs";
$apiKey = "API_KEY";
$sourceFilePath = "/my.doc"; // Or docx/xls/xlsx etc
$targetFormat = "pdf";

$postData = array(
  "source_file" => $sourceFile,
  "target_format" => $targetFormat
);

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $endpoint);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST');
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_SAFE_UPLOAD, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERPWD, $apiKey . ":");
$body = curl_exec($ch);
curl_close($ch);

$response = json_decode($body, true);
print_r($response);
?>

Full disclosure: I'm the lead developer for the Zamzar API.

0
votes

The easiest way to do this in my experience is with the Cloudmersive free native PHP library, just call convertDocumentDocxToPdf:

<?php
require_once(__DIR__ . '/vendor/autoload.php');

// Configure API key authorization: Apikey
$config = Swagger\Client\Configuration::getDefaultConfiguration()->setApiKey('Apikey', 'YOUR_API_KEY');



$apiInstance = new Swagger\Client\Api\ConvertDocumentApi(


    new GuzzleHttp\Client(),
    $config
);
$input_file = "/path/to/file.txt"; // \SplFileObject | Input file to perform the operation on.

try {
    $result = $apiInstance->convertDocumentDocxToPdf($input_file);
    print_r($result);
} catch (Exception $e) {
    echo 'Exception when calling ConvertDocumentApi->convertDocumentDocxToPdf: ', $e->getMessage(), PHP_EOL;
}
?>

Be sure to replace $input_file with the appropriate file path. You can also configure it to use a byte array if you prefer to do it that way. The result will be the bytes of the converted PDF file.

0
votes

Anyone who is looking to do this in Ubuntu/linux using php -

Ubuntu comes with libre office installed default. Anyone can use the shell command to use the headless libre office for this.

shell_exec('/usr/bin/libreoffice --headless --convert-to pdf:writer_pdf_Export --outdir /var/www/html/demo/public_html/src/var/output /var/www/html/demo/public_html/src/var/source/sample.doc');

Hope it helps others like me.

0
votes

Another way to do this, is using directly a parameter on the libreoffice command:

libreoffice --convert-to pdf /path/to/file.{doc,docx}


First you need to download and install LibreOffice. Can be downloaded from Here
Now open your terminal / command prompt then go to libreOffice root, for windows it may be OS/Program Files/LibreOffice/program here you'll find an executable soffice.exe

Here you can convert it directly by the above mentioned commands or you may also use :
soffice in place of libreoffice