2
votes

I have a program (actually SSIS script task, but I don't suppose that matters) that creates an OLE DB connection to an Excel workbook, and reads the cell values in each worksheet, storing them in a SQL Server table.

Each worksheet has several sections of rows, each section being for a separate product. The first two rows of each product section are a quarter row, and a year row. Here is a screen shot:

enter image description here

I use an OleDbDataReader with a "Select *" command to read the data in each sheet into a DataTable. I have a column called "YearQuarter" in my SQL database, where I store a concatenation of the year row value and the preceding quarter row value, with a hyphen between the two strings:

enter image description here

My code is like this:

  OleDbConnection oleExcelConnection = new OleDbConnection(
        "Provider=Microsoft.ACE.OLEDB.12.0;" +
        "Data Source=" + strWkbkFilePath + ";" +
        "Mode=Read;" +
        "Extended Properties=\"Excel 8.0;HDR=No;IMEX=1\"");

   oleExcelConnection.Open();

   DataTable dtCurrSheet = new DataTable();

   // Name of table is in strLoadTblNm.

    OleDbCommand oleExcelCommand;
    OleDbDataReader oleExcelReader;

    oleExcelCommand = excel_conn.CreateCommand();
    oleExcelCommand.CommandText = "Select * From [" + strLoadTblNm + "]";
    oleExcelCommand.CommandType = CommandType.Text;
    oleExcelReader = oleExcelCommand.ExecuteReader();

    // Load worksheet into data table
    dtSheet.Load(oleExcelReader);

    oleExcelReader.Close();

Looking at the output data, I noticed that I was getting inconsistent results. Some rows would have a YearQuarter column value that would have only the Year row value in them, while others would have the cell values from both rows. For example, I'd have "2009 - Year End" followed by just "2010", with no " - 1st Qtr." appended to it.

This is because that quarter cell valued is never loaded into the data reader, as the Dataset Visualizer shows:

enter image description here

Notice also that, in the Dataset, the column that is missing the Quarter cell value also has other numeric values missing their formatting (no commas).

If I save the file as a .csv, all cell values are preserved.

However, I noticed that it wasn't consistent. Sometimes I'd run my package and the same row would now have the full value. So, in the above example, I'd get "2010 - 1st Qtr."

I finally realized that it was working as expected only if I happened to have the workbook open in Excel at the same time that the program was running!

Why would this make a difference? Could it be that there is a macro or something in the workbook that is executed by Excel, but not when the workbook is accessed only via an OLE DB connection? Would the fact that it had been executed in Excel then affect the data obtained by OLE DB? If that's the case, how do I get around this? The spreadsheets are provided to me. So I can't modify them.

2
It's not a solution to the problem, but have you considered exporting to CSV, TXT or something more readable? - vmgmail
What is the value of strLoadTblNm? - user2140173
As I explain above, if i save the file as a .csv, all cell values are preserved. However, this is not an option for this project. A workbook illustrating the problem is here: onedrive.live.com/…. If you download it, you can see that the problem cell in the example - F12 - is DT_R8 instead of DT_WSTR, which may well be an issue. - Buggieboy
@vba4all - strLoadTblNm is the full file path to the .xls file. You can download a copy of the .xls file here: onedrive.live.com/… - Buggieboy

2 Answers

2
votes

I think you're having issues with the auto-formatting thing Excel tries to apply. With an OLEDB connection, I can't see how having the sheet open fixes your problem (obviously very strange).

Try Adding IMEX = 1 to your connection options to treat the entire sheet as text to see if this is your issue. Pulled from OLEDB connection does not read data from excel sheet Also another good post from an external site: Tips for reading Excel spreadsheets using ADO.NET

Also, you're pulling data from an excel sheet and writing it to another excel sheet... Same workbook? I have a couple more ideas for ya though depending on your situation.

1
votes

This bug turns out to be a "feature", and it should come with a big warning sign.

This article (thanks, @vb4all) explains that "ADO.NET scans the first 8 rows of data, and based on that, guesses the datatype for each column. Then it attempts to coerce all data from that column to that datatype, returning NULL whenever the coercion fails!"

In other words, it is treating the worksheet as a relation table, in which all values in a given column are of the same type. Of course, worksheet data is not bound by this restriction.

This behavior can be gotten around by setting IMEX=1 in the connection string options and then modifying these registry settings:

Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/ImportMixedTypes

Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/Typ

(Note: registry keys vary depending on 32 vs. 64 bit. E.g., for 64-bit, the first one would be HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Jet 4.0).

I think this was a very risky design, inviting data transfer errors that could easily go unnoticed.