2
votes

I'm trying to retrieve a docx from database and try to process it by checking its content. I think mycode retrieved my desired file but it seems that I haven't fully understood APACHE POI. I got an error at my stacktrace saying that I the wrong POI any ideas?

Here's how I load the file:

public void loadFile(String FileName)
{
    InputStream is = null;
    try
    {
        //Connecting to MYSQL Database
        Class.forName(driver).newInstance();
        con = DriverManager.getConnection(url+dbName,userName,password);

        Statement stmt = (Statement) con.createStatement();
        ResultSet rs = stmt.executeQuery("SELECT FILE FROM doccompfiles WHERE FileName = '"+ FileName +"'");

        while(rs.next())
        {
            is = rs.getBinaryStream("FILE");
        }

        HWPFDocument doc = new HWPFDocument(is);
        WordExtractor we = new WordExtractor(doc);

        String[] paragraphs = we.getParagraphText();
        JOptionPane.showMessageDialog(null, "Number of Paragraphs" + paragraphs.length);
        con.close();
    }
    catch(Exception ex)
    {
        ex.printStackTrace();
    }
}

Stacktrace:

org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:106)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174)
at documentComparisor.Database.loadFile(Database.java:156)
at documentComparisor.Home$5.actionPerformed(Home.java:195)
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)
at java.awt.Component.processMouseEvent(Unknown Source)
at javax.swing.JComponent.processMouseEvent(Unknown Source)
at java.awt.Component.processEvent(Unknown Source)
at java.awt.Container.processEvent(Unknown Source)
at java.awt.Component.dispatchEventImpl(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Window.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
at java.awt.EventQueue.access$000(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)
1
that is the most useful exception i have ever seenAlex Lynch

1 Answers

7
votes

As you should know, at the moment MS Office documents exist in two different formats: one is the old format that was used by versions of MS Office before 2007 (e.g. ".doc" or ".xls"), another is XML-based format that's used by newer versions (e.g. ".docx" or ".xlsx").

There's different parts in Apache POI that handle different formats. Names of key classes for handling files in old MS Office format generally start with "H", names of the classes for working with files in XML-based format start with "X".

So in your example to handle new format you should use XWPFDocument instead of HWPFDocument:

XWPFDocument doc = new XWPFDocument(is);