0
votes

I'm using ExcelDataReader to read .xls files and extract data. However, I cannot open .xls files which I got from external device due to Invalid file signature exception (I can open it with Excel, but it gives me a pop up that file format and extension don't match and if I want to open it anyway).

FileStream stream = File.Open(filePath, FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader = ExcelReaderFactory.CreateReader(stream);

I have tried to open it using Interop, however, got ***.xls file cannot be accessed. The file may be corrupted, located on a server that is not responding, or read-only exception.

Missing missing = Missing.Value;
Application excel = new Application();
Workbook workbook = excel.Workbooks.Open(filePath,
         missing, missing, missing, missing, missing,
         missing, missing, missing, missing, missing,
         missing, missing, missing, XlCorruptLoad.xlRepairFile);

Is there any way to repair/restore/open/read those corrupted files?

p.s. Open manually with Excel and do SaveAs is not an option as I would need to do it for hundred of files.

Hex content

1
Have you considered giving these files their appropriate filename extensions? Are you sure they're not actually .xlsx-formatted files with a .xls name? - itsme86
It seems to me, given that Excel can open them (with a warning), like the files are not so much corrupted as they are misnamed. - itsme86
Open the file in a hex editor and show us a screenshot of the contents - Caius Jard
I have edited my post with hex snippet - Eduardas Šlutas
Try to change its extension to .zip and extract it then check your sheets `\Book11.xlsx\xl\worksheet` - Mo Khalefa

1 Answers

1
votes

From https://support.microsoft.com/en-gb/office/file-formats-that-are-supported-in-excel-0943ff2c-6014-4e8d-aaea-b83d51d46247 these files are XMLSS type (2003 xml) and should be given a .xml extension. This will stop Excel complaining that the contents don't match the name and hopefully will stop the error when trying via interop.

If interop doesn't work out you might find a library that can deal with XMLSS such as https://www.codeproject.com/Articles/8459/XmlSS-NET-Spreadsheet-Component or plain up reading it as an XML file and extracting the info you need. It might be that reading it as a DataSet (da = new DataSet() then ds.ReadXml(path_to_file)) gives an easy way to work with the data