Why we need to create a Workbook before reading the Excel file using Apache POI?

Question

Please find a Code snippet below

public class DataDriven_GetDataExcel {

public static void main(String[] args) throws  IOException, EncryptedDocumentException, InvalidFormatException {   

//1 Getting Control over File
FileInputStream fis = new FileInputStream("C:\\Users\\bewosaurabh\\Documents\\GetDataFile.xlsx");

//2 Creating a Workbook
Workbook wb = WorkbookFactory.create(fis);

//3 Getting Control over Sheet
Sheet sh = wb.getSheet("Sheet1");
            .
            ......

What I don't understand is why we need to create a Workbook before reading the Excel file? An Excel file is also called as Workbook (as we can see in below picture).

When we Create an excel file that means we are creating a Workbook. From there, we access the Sheets followed by rows and columns.

I don't understand why we write write WorkbookFactory.create(fis); when we already have a 'Workbook' We should have some methods to get the Workbook we have created like we have for Rows(getRow), Sheets (getSheet), Cells (getCell).

Can you help me understand POI?

WorkbookFactory.create does not create a workbook (an Excel file). It instantiates a class that represents a workbook. — GSerg

Axel Richter Axel Richter · Accepted Answer · 2017-08-16T09:30:31

What Workbook wb = WorkbookFactory.create(fis); does is:

Instantiating a Java object, which implements Workbook, from the content of the file read using the file input stream. After that the Workbook object is then located in memory. And only after having access to this Workbook object we can use its methods.

If we would use Workbook wb = WorkbookFactory.create(file);, that is using a File instead of an InputStream, then the WorkbookFactory would create the Workbook object directly from the file. The advantage of this is that not the whole file content must be read into the memory. So we have a lower memory footprint. The disadvantage is that the file which is opened for reading cannot be used for writing the same time. So we cannot write changings we have made using the methods of Workbook into the same file we have read the Workbook from.

If memory footprint is a bigger issue, then for XSSF (*.xlsx), we can get at the underlying XML data and process it using XSSF and SAX (Event API). Using this we need not to instantiate a Workbook object. Instead we are reading and parsing the XML directly from the OPCPackage which is a ZipPackage in case of XSSF (*.xlsx) since a *.xlsx is simply a ZIP archive containing a directory structure containing XML files and other files.

Since a *.xlsx is simply a ZIP archive we also could opening it as FileSystem gotten from FileSystems and then process its content totally independent from third party libraries. But this will be the most challenging approach.

Why we need to create a Workbook before reading the Excel file using Apache POI?

1 Answers