3
votes

While processing an excel file in Apache POI, I noticed that it would skip certain set of empty rows. After a lot of trial and error, I noticed that Apache POI would only read from those rows whose cells have ever been updated.

I have written a short program to read if a row in an XLSX (XSSF model) file is empty. This is my input excel file:

enter image description here

private static boolean isRowEmpty(Row row) {
        boolean isRowEmpty = true;
        if (row != null) {
            for (Cell cell : row) {
                if (cell != null) {
                    System.out.println("Row:" + cell.getRowIndex() + " Col:"
                            + cell.getColumnIndex() + " Len:"
                            + cell.toString().length());
                    isRowEmpty = false;
                } else {
                    System.out.println("Cell is Null at Row "+row.getRowNum());
                }
            }
        } else {
            System.out.println("Row is Null");
        }
        return isRowEmpty;
}

for (Row row : sheet) {
    if (isRowEmpty(row)) {
        System.out.println("Empty row at " + row.getRowNum());
    }
}

OUTPUT

Row:0 Col:0 Len:1
Row:2 Col:0 Len:1
Row:3 Col:0 Len:1
Row:4 Col:0 Len:1
Row:5 Col:0 Len:1
Row:6 Col:1 Len:1
Row:7 Col:0 Len:1
Row:8 Col:2 Len:1

In cell A5, I have entered a space, which gets detected by Apache POI. As you can see from the Output, it does not process row 2 (rownum 1).

Is there any workaround to this so that it gives the following output:

Row:0 Col:0 Len:1
Empty Row at 1
Row:2 Col:0 Len:1
Row:3 Col:0 Len:1
Empty Row at 4
Row:5 Col:0 Len:1
Row:6 Col:1 Len:1
Row:7 Col:0 Len:1
Row:8 Col:2 Len:1

Thanks!

UPDATE 1

Using (cell != null && StringUtils.isNotBlank(cell.toString())) instead of (cell != null) gives me the following output:

Row:0 Col:0 Len:1
Row:2 Col:0 Len:1
Row:3 Col:0 Len:1
Cell is Null for Row 4
Empty row at 4
Row:5 Col:0 Len:1
Row:6 Col:1 Len:1
Row:7 Col:0 Len:1
Row:8 Col:2 Len:1
1

1 Answers

1
votes

This is entirely as expected, as explained in the documentation!

The iterators are there to make life easy to grab the rows and cells with content in them (plus a few others that Excel has randomly still included in the file...).

If you want to fetch every row and cell, no matter if they are defined or not, then you need to follow the advice in the documentation and loop by row and cell number, eg

// Decide which rows to process
int rowStart = Math.min(15, sheet.getFirstRowNum());
int rowEnd = Math.max(1400, sheet.getLastRowNum());

for (int rowNum = rowStart; rowNum < rowEnd; rowNum++) {
   Row r = sheet.getRow(rowNum);
   if (r == null) {
      // Handle there being no cells defined for this row
      continue;
   }

   // Decide how many columns to fetch for this row
   int lastColumn = Math.max(r.getLastCellNum(), MY_MINIMUM_COLUMN_COUNT);

   for (int cn = 0; cn < lastColumn; cn++) {
      Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
      if (c == null) {
         // The spreadsheet is empty in this cell
      } else {
         // Do something useful with the cell's contents
      }
   }
}