0
votes

I am writing some code to import Excel files to database. The files could be big (thousands of rows), so I am using the Event API. POI version is 3.9

I am opening the file like this: FileInputStream fin = new FileInputStream(file);

//create record listener
HSSFRecordListener mainListener =  new HSSFRecordListener("aaa.xls");
// create a new org.apache.poi.poifs.filesystem.Filesystem
POIFSFileSystem poifs = new POIFSFileSystem(fin);
// get the Workbook (excel part) stream in a InputStream
din = poifs.createDocumentInputStream("Workbook");

Some files are cousing the last line to throw FileNotFoundException. Indeed, if I open those files with 7zip, there is no Workbook entry, but there is Book instead.

I have tried to work around this by opening the Book entry if Workbook is not found.

try {
    din = poifs.createDocumentInputStream("Workbook");
} catch (FileNotFoundException e) {
    try {
        din = poifs.createDocumentInputStream("Book");
    } catch (FileNotFoundException e1) {                    
        FileNotFoundException e2 = new FileNotFoundException("Neither Workbook nor Book found in file!");                    
        e2.initCause(e1);
        throw e2;
    }
}

This results in another exception:

org.springframework.web.util.NestedServletException: Request processing failed; nested exception is org.apache.poi.hssf.record.RecordFormatException: Unable to construct record instance
    org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:894)
    org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
    org.netbeans.modules.web.monitor.server.MonitorFilter.doFilter(MonitorFilter.java:393)
root cause

org.apache.poi.hssf.record.RecordFormatException: Unable to construct record instance
    org.apache.poi.hssf.record.RecordFactory$ReflectionConstructorRecordCreator.create(RecordFactory.java:65)
    org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:301)
    org.apache.poi.hssf.record.RecordFactoryInputStream$StreamEncryptionInfo.<init>(RecordFactoryInputStream.java:65)
    org.apache.poi.hssf.record.RecordFactoryInputStream.<init>(RecordFactoryInputStream.java:182)
    org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:139)
    org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:106)
    pl.veracomp.service.SpreadsheetImportService.process(SpreadsheetImportService.java:369)
    pl.veracomp.controller.uploadController.onSubmit(uploadController.java:57)
    org.springframework.web.servlet.mvc.SimpleFormController.processFormSubmission(SimpleFormController.java:272)
    org.springframework.web.servlet.mvc.AbstractFormController.handleRequestInternal(AbstractFormController.java:268)
    org.springframework.web.servlet.mvc.AbstractController.handleRequest(AbstractController.java:153)
    org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:48)
    org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
    org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
    org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
    org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
    org.netbeans.modules.web.monitor.server.MonitorFilter.doFilter(MonitorFilter.java:393)
root cause

org.apache.poi.hssf.record.RecordFormatException: Not enough data (0) to read requested (2) bytes
    org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:216)
    org.apache.poi.hssf.record.RecordInputStream.readShort(RecordInputStream.java:233)
    org.apache.poi.hssf.record.InterfaceHdrRecord.<init>(InterfaceHdrRecord.java:43)
    sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    org.apache.poi.hssf.record.RecordFactory$ReflectionConstructorRecordCreator.create(RecordFactory.java:57)
    org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:301)
    org.apache.poi.hssf.record.RecordFactoryInputStream$StreamEncryptionInfo.<init>(RecordFactoryInputStream.java:65)
    org.apache.poi.hssf.record.RecordFactoryInputStream.<init>(RecordFactoryInputStream.java:182)
    org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:139)
    org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:106)
    pl.veracomp.service.SpreadsheetImportService.process(SpreadsheetImportService.java:369)
    pl.veracomp.controller.uploadController.onSubmit(uploadController.java:57)
    org.springframework.web.servlet.mvc.SimpleFormController.processFormSubmission(SimpleFormController.java:272)
    org.springframework.web.servlet.mvc.AbstractFormController.handleRequestInternal(AbstractFormController.java:268)
    org.springframework.web.servlet.mvc.AbstractController.handleRequest(AbstractController.java:153)
    org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:48)
    org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
    org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
    org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
    org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
    org.netbeans.modules.web.monitor.server.MonitorFilter.doFilter(MonitorFilter.java:393)

Google have some info about fixed bugs in POI 3.2 and 3.7 which have something to do with exception Not enough data (0) to read requested (2) bytes, but it seems that it was something else.

The same files can be opened successfully in Excel 2007. When I save them manually with Save As=>Excel 97/2003, 7zip shows that the Book entry have been replaced with Workbook and I can successfully import them with Apache POI.

Did anyone found this issue? How to workaround it?

EDIT

The problem is when I try to open files saved in Microsoft Excel 5.0/95 file format.

To reproduce this issue create new spreadsheet, enter any data, and save as => Microsoft Excel 5.0/95 Workbook (*.xls).

Is there any way to read this format with apache POI or do I have to force my users to upgrade their workbooks before uploading?

3
Where did the file come from? The Not enough data (0) to read requested (2) bytes error seems to indicate it's an old or non-standard file...Gagravarr
The files are internal pricelists, genereted by with PHP with Pear_Excel_Spreadseet_Writer, which are then opened by our product managers, modified, etc, and saved.SWilk
Ok, I know what it is about... The Spreadsheet_Excel_Writer is a very old library. It generates so old file format, that the 7zip cannot even open it to show struture. Still, when open with Office and saved the format is "updated" to Microsoft Excel 5.0/95.SWilk
POI doesn't support Excel 95, only Excel 97 and newer, so if the file is in that old format it won't ever workGagravarr

3 Answers

1
votes

At the time of your question, there wasn't another solution with Apache POI. The good news is that there is now!

In new versions of Apache POI, if you call HSSFWorkbook or WorkbookFactory with one of these old files, you'll get the more helpful OldExcelFormatException thrown

If you want to get some information out of these files, then OldExcelExtractor is able to fetch text and numbers from formats including Excel 95 (and older!).

In order to support that, there's also some Record classes for these, so you could do some event-based parsing to handle them in more detail. There's no friendly UserModel support though

1
votes

More generally, from the POI perspective, OpenOffice or LibreOffice may write old MS Office format documents in a better quality than MS Office. I worked around this problem when POI failed to read a 97 version .xls file as HSSFWorkbook.

0
votes

It's a version problem: The file is in an old version. To confirm this, open your file with a new version of Excel, Modify it, save it and retry.