0
votes

I am reading excel file using POI library in my java code. So far fine. But now I have one requirement. The excel file contains many records (e.g. 1000 rows). It also has column headers (1st row). Now I am doing excel filtering on it. Say I have one 'year' column and I am filtering all rows for year=2019. I get 15 rows. Question: I want to process only these 15 rows in my java code. Is there any method in poi library or way to know if the row being read is filtered or (the other way i.e. not filtered). Thanks.

I already have working code but right now I am looking for how to read only filtered row. Nothing new tried yet other than searching in library and forums.

The below code is inside a method. I am not used to formatting with stackoverflow so kindly ignore any formatting issue.

    // For storing data into CSV files
    StringBuffer data = new StringBuffer();
    try {
        SimpleDateFormat dtFormat = new SimpleDateFormat(CommonConstants.YYYY_MM_DD); // "yyyy-MM-dd"
        String doubleQuotes = "\"";
        FileOutputStream fos = new FileOutputStream(outputFile);
        // Get the workbook object for XLSX file
        XSSFWorkbook wBook = new XSSFWorkbook(new FileInputStream(inputFile));
        wBook.setMissingCellPolicy(Row.RETURN_BLANK_AS_NULL);

        // Get first sheet from the workbook
        //XSSFSheet sheet = wBook.getSheetAt(0);
        XSSFSheet sheet = wBook.getSheet(CommonConstants.METADATA_WORKSHEET);
        //Row row;
        //Cell cell;
        // Iterate through each rows from first sheet
        int rows = sheet.getLastRowNum();
        int totalRows = 0;
        int colTitelNumber = 0;
        Row firstRowRecord = sheet.getRow(1);
        for (int cn = 0; cn < firstRowRecord.getLastCellNum(); cn++) {
            Cell cellObj = firstRowRecord.getCell(cn);
            if(cellObj != null) {
                String str = cellObj.toString();
                if(CommonConstants.COLUMN_TITEL.equalsIgnoreCase(str)) {
                    colTitelNumber = cn;
                    break;
                }
            }
        }
        // Start with row Number 1. We don't need 0th number row as it is for Humans to read but not required for processing.
        for (int rowNumber = 1; rowNumber <= rows; rowNumber++) {
            StringBuffer rowData = new StringBuffer();
            boolean skipRow = false;
            Row rowRecord = sheet.getRow(rowNumber);
            if (rowRecord == null) {
                LOG.error("Empty/Null record found");
            } else {
                for (int cn = 0; cn < rowRecord.getLastCellNum(); cn++) {
                    Cell cellObj = rowRecord.getCell(cn);
                    if(cellObj == null) {
                        if(cn == colTitelNumber) {
                            skipRow = true;
                            break; // The first column cell value is empty/null. Which means Titel column cell doesn't have value so don't add this row in csv.
                        }
                        rowData.append(CommonConstants.CSV_SEPARTOR);
                        continue;
                    }
                    switch (cellObj.getCellType()) {
                        case Cell.CELL_TYPE_BOOLEAN:
                            rowData.append(cellObj.getBooleanCellValue() + CommonConstants.CSV_SEPARTOR);
                            //LOG.error("Boolean:" + cellObj.getBooleanCellValue());
                            break;

                        case Cell.CELL_TYPE_NUMERIC:
                            if (DateUtil.isCellDateFormatted(cellObj)) {
                                Date date = cellObj.getDateCellValue();
                                rowData.append(dtFormat.format(date).toString() + CommonConstants.CSV_SEPARTOR);
                                //LOG.error("Date:" + cellObj.getDateCellValue());
                            } else {
                                rowData.append(cellObj.getNumericCellValue() + CommonConstants.CSV_SEPARTOR);
                                //LOG.error("Numeric:" + cellObj.getNumericCellValue());
                            }
                            break;

                        case Cell.CELL_TYPE_STRING:
                            String cellValue = cellObj.getStringCellValue();
                            // If string contains double quotes then replace it with pair of double quotes.
                            cellValue = cellValue.replaceAll(doubleQuotes, doubleQuotes + doubleQuotes);
                            // If string contains comma then surround the string with double quotes.
                            rowData.append(doubleQuotes + cellValue + doubleQuotes + CommonConstants.CSV_SEPARTOR);
                            //LOG.error("String:" + cellObj.getStringCellValue());
                            break;

                        case Cell.CELL_TYPE_BLANK:
                            rowData.append("" + CommonConstants.CSV_SEPARTOR);
                            //LOG.error("Blank:" + cellObj.toString());
                            break;

                        default:
                            rowData.append(cellObj + CommonConstants.CSV_SEPARTOR);
                    }
                }
                if(!skipRow) {
                    rowData.append("\r\n");
                    data.append(rowData); // Appending one entire row to main data string buffer.
                    totalRows++;
                }
            }
        }
        pTransferObj.put(CommonConstants.TOTAL_ROWS, (totalRows));
        fos.write(data.toString().getBytes());
        fos.close();
        wBook.close();
    } catch (Exception ex) {
        LOG.error("Exception Caught while generating CSV file", ex);
    }
3
Paste some code that you have written so that people will help you.Sambit
Added. But isn't there any standward way of reading only filtered row from excel. The above code is just a sample one.Jitendra

3 Answers

2
votes

All rows which are not visible in the sheet have a zero height. So if the need is only reading the visible rows, one could check via Row.getZeroHeight.

Example

Sheet:

enter image description here

Code:

import java.io.FileInputStream;

import org.apache.poi.ss.usermodel.*;

class ReadExcelOnlyVisibleRows {

 public static void main(String[] args) throws Exception {

  Workbook workbook  = WorkbookFactory.create(new FileInputStream("SAMPLE.xlsx"));

  DataFormatter dataFormatter = new DataFormatter();

  CreationHelper creationHelper = workbook.getCreationHelper();

  FormulaEvaluator formulaEvaluator = creationHelper.createFormulaEvaluator();

  Sheet sheet = workbook.getSheetAt(0);

  for (Row row : sheet) {
   if (!row.getZeroHeight()) { // if row.getZeroHeight() is true then this row is not visible
    for (Cell cell : row) {
     String cellContent = dataFormatter.formatCellValue(cell, formulaEvaluator);
     System.out.print(cellContent + "\t");
    }
    System.out.println();
   }
  }

  workbook.close();

 }
}

Result:

F1    F2    F3      F4  
V2    2     2-Mai   FALSE   
V4    4     4-Mai   FALSE   
V2    6     6-Mai   FALSE   
V4    8     8-Mai   FALSE   
0
votes

You have to use auto filter provided in Apache Poi library and also you have set the freezing. I provide below the brief code snippet, you can use accordingly.

XSSFSheet sheet = wBook.getSheet(CommonConstants.METADATA_WORKSHEET);
sheet.setAutoFilter(new CellRangeAddress(0, 0, 0, numColumns));
sheet.createFreezePane(0, 1);
0
votes

I had to override some hooks and come up with my own approach to incorporate filtering of hidden rows in order to prevent processing of those. Below is code snippet. My approach consists of opening a second copy of the same sheet just so that I can query the current row getting processed to see if it's hidden or not. The answer above touches on this, the below expands on it to show how it can be nicely incorporated into the Spring batch excel framework. One drawback is that you have to open a second copy of the same file, but I couldn't figure out a way (perhaps there's none!) to get my hands on the internal Workbook sheet, among other reasons because org.springframework.batch.item.excel.poi.PoiSheet is package private (Note that below syntax is Groovy!!!):

  /**
   * Produces a reader that knows how to ingest a file in excel format.
   */
  private PoiItemReader<String[]> createExcelReader(String filePath) {
    File f = new File(filePath)
    PoiItemReader<String[]> reader = new PoiItemReader<>()
    reader.setRowMapper(new PassThroughRowMapper())
    Resource resource = new DefaultResourceLoader().getResource("file:" + f.canonicalPath)
    reader.setResource(resource)
    reader.setRowSetFactory(new VisibleRowsOnlyRowSetFactory(resource))
    reader.open(new ExecutionContext())
    reader
  }

...

// The "hooks" I overwrote to inject my logic

  static class VisibleRowsOnlyRowSet extends DefaultRowSet {
    Workbook workbook
    Sheet sheet

    VisibleRowsOnlyRowSet(final Sheet sheet, final RowSetMetaData metaData) {
      super(sheet, metaData)
    }

    VisibleRowsOnlyRowSet(final Sheet sheet, final RowSetMetaData metaData, Workbook workbook) {
      this(sheet, metaData)
      this.workbook = workbook
      this.sheet = sheet
    }

    boolean next() {
      boolean moreLeft = super.next()
      if (moreLeft) {
        Row row = workbook.getSheet(sheet.name).getRow(getCurrentRowIndex())
        if (row?.getZeroHeight()) {
          log.warn("Row $currentRow is hidden in input excel sheet, will omit it from output.")
          currentRow.eachWithIndex { _, int i ->
            currentRow[i] = ''
          }
        }
      }
      moreLeft
    }
  }

  static class VisibleRowsOnlyRowSetFactory extends DefaultRowSetFactory {
    Workbook workbook

    VisibleRowsOnlyRowSetFactory(Resource resource) {
      this.workbook = WorkbookFactory.create(resource.inputStream)
    }

    RowSet create(Sheet sheet) {
      new VisibleRowsOnlyRowSet(sheet, super.create(sheet).metaData, workbook)
    }
  }