1
votes

I'm making a java application, which needs to read values from a web generated .xls file.

Unfortunately that .xls file is not a real .xls file, its a bunch of html tags and the auto-generation program just changes its extension to .xls.

To read the value of cells in the auto-generated file, I was intending to use Apache-POI library. But it seems like the library reads only from legit .xls files. Upon running the code, it gives following error

java.io.IOException: Invalid header signature; read 0x6D74683C0A0D0A0D, expected   0xE11AB1A1E011CFD0
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:140)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:322)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:303)
at excel.ReadAccountName.main(ReadAccountName.java:17)

I'm thinking of either of 2 following solutions -

  1. Is it possible to convert the auto-generated .xls file to LEGIT .xls format, within the Java code only?

  2. Some other way that I can read from the auto-generated .xls file?

If there are any other possible solutions to this, please suggest.

1

1 Answers

1
votes

If it's pure HTML, you can use Jsoup or another HTML parser to extract data from the source file, and then build a xls file by using POI.