4
votes

I'm trying to update a Microsoft Word document using Apache POI. The msword document is a template that contains a number of placeholders in the form "${place.holder}" and all I need to do is to replace the holders with specific values. What I've got so far is

private void start() throws FileNotFoundException, IOException {

    POIFSFileSystem fsfilesystem = null;
    HWPFDocument hwpfdoc = null;

    InputStream resourceAsStream =  getClass().getResourceAsStream("/path/to/document/templates/RMA FORM.doc");       
    try {
        fsfilesystem = new POIFSFileSystem(resourceAsStream );
        hwpfdoc = new HWPFDocument(fsfilesystem);
        Range range = hwpfdoc.getRange();

        range.replaceText("${rma.number}","08739");
        range.replaceText("${customer.name}", "Roger Swann");

        FileOutputStream fos = new FileOutputStream(new File("C:\\temp\\updatedTemplate.doc"));
        hwpfdoc.write(fos);
        fos.flush();
        fos.close();

    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

}

The program runs without errors. If I look in the output file with a Hex editor I can see that the placeholders have been replaced by the program. However, when I try to open the document with MSWord, MSWord crashes.

Is there a step (series of steps) that I'm missing, or am I basically out of luck with this? Do I need to adjust any counters because the length of the replacement text is not the same as the length of the replaced text?

Regards

1
Have you tried with POI 3.8 beta 4? There's been a lot of work done on .DOC handling in HWPF lately, so you may find it's fixedGagravarr
I've tried with 3.8 beta 4 and I can now open the document but the contents are heavily corrupted. I've even tried removing the updates from the program so all I'm doing is opening the .doc and writing it back and I still get the data corruptionuser497087
You probably best report a bug in the POI bugzilla, and upload the problem fileGagravarr
That's been done. I assume that I'm not missing something stupid?user497087
The Word .DOC file format is quite complex, and the documentation has only recently been made publically available (before that you had to reverse engineer it). Alas therefore HWPF isn't always complete and sometimes has problems on complex filesGagravarr

1 Answers

0
votes

use new FileInputStream() instead of getClass().getResourceAsStream("/path/to/document/templates/RMA FORM.doc");