0
votes

I'm using POI 3.11 to read in two Excel files side by side to compare them.

When I cycle through the two files to find the number of rows/columns for each sheet, POI claims no rows for one of the sheets, but if I open one of the files, and make a single edit on a single sheet, and run it through POI again, then the numbers are different.

I wrote a small program to count the rows in a single file before and after I made the change, here's the output:

FileName: clean.xls Sheet name: [One] [One] row count: 1 Sheet name: [Two] [Two] row count: 2 FileName: modified.xls Sheet name: [One] [One] row count: 3499 Sheet name: [Two] [Two] row count: 10

Bear in mind that the change I am making is to sheet One only, and yet the row counts for both sheets seem to be updating. Importantly there are 3499 and 10 rows there respectively on both version of the file.

Here's a cut down version of the code:

package com.jpmc.firmrisk.tools.excel; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import org.apache.log4j.Logger; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.xssf.usermodel.XSSFSheet; import org.apache.poi.xssf.usermodel.XSSFWorkbook; public class ExcelFileParser { private static final Logger LOG = Logger.getLogger(ExcelFileParser.class); public static void main(String[] args) { String[] fileNames = new String[] { "clean.xls", "modified.xls" }; for (String fileName: fileNames) { try { LOG.info("FileName: " + fileName); FileInputStream file = new FileInputStream(new File(fileName)); XSSFWorkbook workbook = new XSSFWorkbook(file); for (int i = 0; i < workbook.getNumberOfSheets(); i++) { XSSFSheet sheet = workbook.getSheetAt(i); String sheetName = sheet.getSheetName(); LOG.info(" Sheet name: " + sheetName); int rowCount = 0; for (Row ignored: sheet) { rowCount++; } LOG.info(" " + sheetName + " row count: " + rowCount); } file.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException ae) { ae.printStackTrace(); } } } }

Rather than using an iterator, I have also tried using the Sheet.getFirstRowNum(), Sheet.getLastRowNum() and Sheet.getPhysicalNumberOfRows() methods, but all claim there is no data on the sheet for the clean spreadsheet.

Has anyone seen this before? And is there a fix for it? OR is this a known problem with Excel files?

1
How are you fetching the number of rows on a sheet? And could it be that the input spreadsheet was generated by another library and not by Excel? - Gagravarr
@Sankumarsingh I've shared a cut down version of the code, my employer is pretty secretive at the best of times. :) - Chris Cooper
@Gagravarr As far as I know these workbooks are used as a template to merge other data into and have been created with Excel, the files they are merged with are generated in other software, but don't display this problem. - Chris Cooper
Try running BiffViewer against the original and re-saved files. Don't worry about the offsets, but see if there are any key differences in records present / missing - Gagravarr

1 Answers

1
votes

It turns out the original sheets were created with SpreadSheet Gears (A C# library apparently) rather than Excel, I'm guessing that POI and SSG have made different assumptions about the structure of an Excel save file, and this is where the problem stems from.