1
votes

I want to read a large excel-file(.xlsx/.xls). When I upload a 20MB file, suddenly Java-Heap was increased by 2GB and ran into OutOfMemoryError.

private Sheet getSheetForFileType(String filType, InputStream fileData) throws IOException {
    Workbook workbook;
    Sheet sheet;
    if (filType.equalsIgnoreCase("xls")) {
        workbook = new HSSFWorkbook(fileData); //OutOfMemoryError
        sheet = workbook.getSheetAt(0);
    } else {
        workbook = new XSSFWorkbook(fileData); //OutOfMemoryError
        sheet = workbook.getSheetAt(0);
    }
    return sheet;
}

As mentioned here Apache-POI overview , I tried with XSSF and SAX (Event API) Modified code as below:

private Sheet getSheetForFileType(String filType, InputStream fileData) throws IOException {

    if (filType.equalsIgnoreCase("xls")) {
        ....
    } else {
        OPCPackage opcPackage = OPCPackage.open(fileData);  //OutOfMemoryError
        XSSFReader xssfReader = new XSSFReader(opcPackage);
        SharedStringsTable sharedStringsTable = xssfReader.getSharedStringsTable();
        XMLReader parser = getSheetParser(sharedStringsTable);
        ....
        ....
    }
    return sheet;
}

Yet, I'm unable to load-file and read it.

I read file-data from Input-Stream, purpose is ONLY to Read-Data no write operations on it.

Reading a File takes lower memory, while an InputStream requires more memory as it has to buffer the whole file.

I went through other posts, what i understand:

  • increase Heap Memory
  • excel-streaming-reader [can't use, i've to support both(.xlsx/.xls)]
  • read using SAX parser

Update-1: Added a sample-excel picture.

sample excel

1
Have you looked at other related questions and answers on Stack Overflow that suggest reading from a file rather than from an InputStream when encountering OutOfMemoryErrors? Is this a viable option? Have you tried this?Jonny Henly
@JonnyHenly, yes i searched in other posts. 1st code, i followed in memory, DOM approach(problem here is it can't read until whole file is loaded), 2nd code i followed XSSF and SAX approach as mentioned in the official examples/doc. yet no-luckPawan
If even the OPCPackage.open(fileData) leads to out of memory error, then you have no chance. But I really doubt that because it does nothing else than opening the ZIP archive of the Office Open XML *.xlsx. That normally cannot lead to out of memory error.Axel Richter
If you are interested just in values, why not to export all the xls/xlsx files to csv through excel? you can create a small macro just to batch it and work with csv (that won´t give you any problem) better than xls/xlsx if you can not handle it.David García Bodego

1 Answers

0
votes

Try to use very efficient and high performance streaming SXSSFWorkbook class instead of XSSFWorkbook (which keeps the entire Excel workbook in memory) like below:

SXSSFWorkbook workbook = new SXSSFWorkbook(100);

where 100 is the default number of rows that will be kept in memory and processed in real time.