0
votes

This is old question I posted:

Reading one & Update some other Excel with c#

As suggested I created schema.ini file. My excel files have so many columns (many of them are not fixed) with mixed data. Even a cell contains numbers along with text. I observed that NOT ALL values are shown when I read excel using OLEDB and populate into a DataTable.

I can't assume ALL columns are put them into .ini file. Columns in my excel will go up to "DX". I observed that only 1st row which has number+text value are shown but similar text appears somewhere down aren't shown. It's shows as blank.

Here is connection string:

string strConn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + FilePath+ "';Extended Properties=\"Excel 12.0;HDR=YES;IMEX=1;TypeGuessRows=0;ImportMixedTypes=Text\"";

Is there any solution so it reads all types of data?

1
please re-format your codeChristopher H.
Could you explain "re-format"?user1480864
put four spaces before your code,that's allChristopher H.
and how does your table look like ? what do you mean by "not all values are shown" ?Christopher H.
I didn't understand "put four space before your code"? Columns in my excel will go up to "DX". I observed that only 1st row which has number+text value are shown but similar text appears somewhere down aren't shown. It's shows as blank.user1480864

1 Answers

3
votes

This comes up a lot and it's very understandable because the documentation is somewhat lacking

Microsoft.ACE.OLEDB.12.0 does not handle columns of mixed data types very well. So what happens is that the driver will always read the first n values in each column and assign a datatype depending on what it finds in the first n cells of the column. n is determined by the setting of a registry key. It moves around depending on whether you have a 64 bit implementation or a 32 bit one but the 64 bit key is in...

HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows

Sadly it's not always convenient to go around modifying registry keys and it would have been much better to leave this setting on the connection string but it is what it is. The default value for this is 8 rows.

If the driver finds mixed data types then and only then does the setting of IMEX come into play. If IMEX=1 is included then a column of mixed data types is returned as text. If it is not specified then any values which do not correspond to the assigned data type are returned as null.

This is where HDR=No is useful. If you have a header then specify HDR=No and read it. This will help ensure that the column is returned as text as long as your headers are all text as well of course. You can then discard the header before processing the data. It won't help if you have a majority numeric/date time data types in the first n cells of the column.

As an aside the driver will read all types of Excel files including .xls, .xlsm and .xlsx - there is no need to change the extended properties away from Excel 12.0 to do so. This is a considerable advantage.

The older Microsoft.Jet.OLEDB.4.0 was good in that you could specify TypeGuessRows and ImportMixedTypes in the connection string but Microsoft.ACE.OLEDB.12.0 completely ignores them so you can remove them from your connection string as their presence is misleading. The older driver can only read .xls files.

Both drivers will only read 255 columns without amending the SELECT statement. To read more than 255 columns you specify a range. E.g.

Select * From [Sheet1$IV:SP]

will read columns 256-510. If your sheet ends on DX it is well within the 255 column limit.

Hidden columns are always returned.

There are a couple of nasties with this driver. Firstly leading empty rows or columns are ignored completely. This can really mess things up if you are expecting data in a specific rows/columns. Secondly Excel incorrectly treats 29/Feb/1900 as a valid date but OLEDB does not. You can stick 29/Feb/1900 into an Excel spreadsheet just fine but OLEDB will return it as 28/Feb/1900. I can't see anything else it could do really.

The driver is a very handy and cheap way of reading well formatted Excel spreadsheets as long as you are aware of the limitations and can code around them.

Good luck.