Does Apache POI provide any function for us to count the number of "not-null" rows in a sheet of an Excel file?
At the first time, I have an Excel sheet with 10 data rows, the function worksheet.getPhysicalNumberOfRows() returns the exact number (10). But after that, I delete 3 rows, then that function still gets 10 rows. Maybe the total number of rows was cached anywhere by POI. What does getPhysicalNumberOfRows() mean? As its API described: "Returns the number of physically defined rows (NOT the number of rows in the sheet)", but I do not understand what "physically defined" mean. Can you help me on this issue?
Thank you so much!
5 Answers
If you delete the rows via worksheet.removeRow(Row row), then the physical row count should be 7.
POI uses a map to store the rows of a sheet. This map is the physical part. See http://www.google.com/codesearch/p?hl=de#WXzbfAF-tQc/trunk/src/java/org/apache/poi/hssf/usermodel/HSSFSheet.java
As to the logically null rows, try
int notNullCount = 0;
Sheet sheet = wb.getSheetAt(0);
for (Row row : sheet) {
for (Cell cell : row) {
if (cell.getCellType() != Cell.CELL_TYPE_BLANK) {
if (cell.getCellType() != Cell.CELL_TYPE_STRING ||
cell.getStringCellValue().length > 0) {
notNullCount++;
break;
}
}
}
}
If you look at the code of the POI 3.8 beta 3, you'll see that removing a row should also remove its record. As such, the number of physical rows in the file should go down too
I'd suggest you try with a newer version of POI
To count the number of non empty rows in a file, do something like Amadeus suggests and loop over the rows on the sheet and check if they have cells.
I had the same problem. If I delete the rows manually, still the rowcount did not decrease, when I checked using sheet.getPhysicalNumberOfRows().
When I deeply went into that issue I found the exact problem. I had an email column in my row, and when I entered an email address MS Office automatically detects that as an email address. And when I delete that entire row manually, the cell that carried the email address still had "" as its value(it will not be visible, but I found that value getting initialised when I read it through java). So since this cell has a not null value(""), this entire row is getting declared(sort of) and the row count is increased.
The funny part is that when I dont enter an email address, and just enter some string and then delete the row, the cell is not getting initialised and the ROWCOUNT GOT DECREASED actually. This is what I found as a result of my problem.
Finally I solved that, by not only addding null check for the cells, but also
if(cell != "")
Hope this might be useful for you
We can write a custom method that will ignore blank rows to give row count. Probably we can make some assumptions based on out requirement. For example in my case, a row can be considered blank if its first column value is empty and count is needed only till first blank row.
So following snippet could be useful:
public int getNonBlankRowCount(String sheetName){
int rowCount = 0;
int index = workbook.getSheetIndex(sheetName);
if(index==-1){
rowCount = -1;
return rowCount;
}else{
sheet = workbook.getSheetAt(index);
Iterator<Row> rowIterator = sheet.rowIterator();
rowCount = 0;
while (rowIterator.hasNext()) {
Row row = (Row) rowIterator.next();
cell = (HSSFCell) row.getCell(0);
String cellValue = cell.getStringCellValue();
if (cellValue.isEmpty()) {
break;
}
rowCount++;
}
return rowCount;
}
}