5
votes

I've been using an OleDb connection to read excel files successfully for quite a while now, but I've run across a problem. I've got someone who is trying to upload an Excel spreadsheet with nothing in the first column and when I try to read the file, it doesn't recognize that column.

I'm currently using the following OleDb connection string:

Provider=Microsoft.Jet.OLEDB.4.0;
Data Source=c:\test.xls;
Extended Properties="Excel 8.0;IMEX=1;"

So, if there are 13 columns in the excel file, the OleDbDataReader I get back only has 12 columns/fields.

Any insight would be appreciated.

6
If there is nothing in the first column, whats the problem?StingyJack

6 Answers

3
votes

SpreadsheetGear for .NET gives you an API for working with xls and xlsx workbooks from .NET. It is easier to use and faster than OleDB or the Excel COM object model. You can see the live samples or try it for yourself with the free trial.

Disclaimer: I own SpreadsheetGear LLC

EDIT:

StingyJack commented "Faster than OleDb? Better back that claim up".

This is a reasonable request. I see claims all the time which I know for a fact to be false, so I cannot blame anyone for being skeptical.

Below is the code to create a 50,000 row by 10 column workbook with SpreadsheetGear, save it to disk, and then sum the numbers using OleDb and SpreadsheetGear. SpreadsheetGear reads the 500K cells in 0.31 seconds compared to 0.63 seconds with OleDB - just over twice as fast. SpreadsheetGear actually creates and reads the workbook in less time than it takes to read the workbook with OleDB.

The code is below. You can try it yourself with the SpreadsheetGear free trial.

using System;
using System.Data; 
using System.Data.OleDb; 
using SpreadsheetGear;
using SpreadsheetGear.Advanced.Cells;
using System.Diagnostics;

namespace SpreadsheetGearAndOleDBBenchmark
{
    class Program
    {
        static void Main(string[] args)
        {
            // Warm up (get the code JITed).
            BM(10, 10);

            // Do it for real.
            BM(50000, 10);
        }

        static void BM(int rows, int cols)
        {
            // Compare the performance of OleDB to SpreadsheetGear for reading
            // workbooks. We sum numbers just to have something to do.
            //
            // Run on Windows Vista 32 bit, Visual Studio 2008, Release Build,
            // Run Without Debugger:
            //  Create time: 0.25 seconds
            //  OleDb Time: 0.63 seconds
            //  SpreadsheetGear Time: 0.31 seconds
            //
            // SpreadsheetGear is more than twice as fast at reading. Furthermore,
            // SpreadsheetGear can create the file and read it faster than OleDB
            // can just read it.
            string filename = @"C:\tmp\SpreadsheetGearOleDbBenchmark.xls";
            Console.WriteLine("\nCreating {0} rows x {1} columns", rows, cols);
            Stopwatch timer = Stopwatch.StartNew();
            double createSum = CreateWorkbook(filename, rows, cols);
            double createTime = timer.Elapsed.TotalSeconds;
            Console.WriteLine("Create sum of {0} took {1} seconds.", createSum, createTime);
            timer = Stopwatch.StartNew();
            double oleDbSum = ReadWithOleDB(filename);
            double oleDbTime = timer.Elapsed.TotalSeconds;
            Console.WriteLine("OleDb sum of {0} took {1} seconds.", oleDbSum, oleDbTime);
            timer = Stopwatch.StartNew();
            double spreadsheetGearSum = ReadWithSpreadsheetGear(filename);
            double spreadsheetGearTime = timer.Elapsed.TotalSeconds;
            Console.WriteLine("SpreadsheetGear sum of {0} took {1} seconds.", spreadsheetGearSum, spreadsheetGearTime);
        }

        static double CreateWorkbook(string filename, int rows, int cols)
        {
            IWorkbook workbook = Factory.GetWorkbook();
            IWorksheet worksheet = workbook.Worksheets[0];
            IValues values = (IValues)worksheet;
            double sum = 0.0;
            Random rand = new Random();
            // Put labels in the first row.
            foreach (IRange cell in worksheet.Cells[0, 0, 0, cols - 1])
                cell.Value = "Cell-" + cell.Address;
            // Using IRange and foreach be less code, 
            // but we'll do it the fast way.
            for (int row = 1; row <= rows; row++)
            {
                for (int col = 0; col < cols; col++)
                {
                    double number = rand.NextDouble();
                    sum += number;
                    values.SetNumber(row, col, number);
                }
            }
            workbook.SaveAs(filename, FileFormat.Excel8);
            return sum;
        }

        static double ReadWithSpreadsheetGear(string filename)
        {
            IWorkbook workbook = Factory.GetWorkbook(filename);
            IWorksheet worksheet = workbook.Worksheets[0];
            IValues values = (IValues)worksheet;
            IRange usedRahge = worksheet.UsedRange;
            int rowCount = usedRahge.RowCount;
            int colCount = usedRahge.ColumnCount;
            double sum = 0.0;
            // We could use foreach (IRange cell in usedRange) for cleaner 
            // code, but this is faster.
            for (int row = 1; row <= rowCount; row++)
            {
                for (int col = 0; col < colCount; col++)
                {
                    IValue value = values[row, col];
                    if (value != null && value.Type == SpreadsheetGear.Advanced.Cells.ValueType.Number)
                        sum += value.Number;
                }
            }
            return sum;
        }

        static double ReadWithOleDB(string filename)
        {
            String connectionString =  
                "Provider=Microsoft.Jet.OLEDB.4.0;" + 
                "Data Source=" + filename + ";" + 
                "Extended Properties=Excel 8.0;"; 
            OleDbConnection connection = new OleDbConnection(connectionString); 
            connection.Open(); 
            OleDbCommand selectCommand =new OleDbCommand("SELECT * FROM [Sheet1$]", connection); 
            OleDbDataAdapter dataAdapter = new OleDbDataAdapter(); 
            dataAdapter.SelectCommand = selectCommand; 
            DataSet dataSet = new DataSet(); 
            dataAdapter.Fill(dataSet); 
            connection.Close(); 
            double sum = 0.0;
            // We'll make some assumptions for brevity of the code.
            DataTable dataTable = dataSet.Tables[0];
            int cols = dataTable.Columns.Count;
            foreach (DataRow row in dataTable.Rows)
            {
                for (int i = 0; i < cols; i++)
                {
                    object val = row[i];
                    if (val is double)
                        sum += (double)val;
                }
            }
            return sum;
        }
    }
}
1
votes

We always use Excel Interop to open the spreadsheet and parse directly (e.g. similar to how you would scan through cells in VBA), or we create locked down templates that enforce certain columns to be filled in before the user can save the data.

1
votes

You can probably look at ExcelMapper. It is a tool to read excel files as strongly typed objects. It hides all the details of reading an excel from your code. It would take care if your excel is missing a column or data is missing from a column. You read data that you are interested in. You can get the code/executable for ExcelMapper from http://code.google.com/p/excelmapper/.

0
votes

If could require the format of the excel sheet to have column headers, then you would always have the 13 columns. You would just need to skip the header row when processing.

This would also correct situations where the user puts the columns in an order that you are not expecting. (detect column indexes in the header row and read appropriately)

I see that others are recommending the Excel interop, but jeez that's a slow option compared to the OleDb way. Plus it requires Excel or OWC to be installed on the server (licensing).

0
votes

You might try using Excel and COM. That way, you'll be getting your info straight form the horse's mouth, as it were.

From D. Anand over on the MSDN forums:

Create a reference in your project to Excel Objects Library. The excel object library can be added in the COM tab of adding reference dialog.

Here's some info on the Excel object model in C# http://msdn.microsoft.com/en-us/library/aa168292(office.11).aspx

0
votes

I recommend you to try Visual Studio Tools for Office and Excel Interop! It's using is very easy.