9
votes

I have an issue getting worksheet names from an Excel spreadsheet using OLEDB. The problem is that when I use GetOleDbSchemaTable, the resulting DataTable has more than just the actual worksheet names; it has extra rows for "Tables" that I can only assume are used internally by Excel.

So for example, if I have a worksheet named myWorksheet, the code below might end up with a list that contains myWorksheet$, myWorksheet$PrintTable and myWorksheet$_. Only the first myWorksheet$ record is for the actual worksheet. The others are just garbage that I don't need. When you look at them in metadata they look just like regular tables, even with the type of TABLE.

For now I just manually filtered out anything with "$_" or "$Print" in the name, but who knows what other Excel feature might make these extra records turn up in a different format.

Does anyone know the best way to get ONLY actual worksheet names, and not these internal tables that aren't worksheets? Is there something in metadata that would differentiate them?

 private ArrayList getXlsWorksheetNames(OleDb.OleDbConnection conn)
    {
        ArrayList wsList = new ArrayList();
        DataTable schemaTable;

        try
        {
            conn.Open();
            schemaTable = conn.GetOleDbSchemaTable(OleDb.OleDbSchemaGuid.Tables, null);

            foreach (DataRow row in schemaTable.Rows)
            {
                //form.appendToResultsTxt("Adding worksheet to list: " + Environment.NewLine +
                //    "Name = " + row.Field<string>("TABLE_NAME") + "," + Environment.NewLine +
                //    "Type = " + row.Field<string>("TABLE_TYPE") + "," + Environment.NewLine + Environment.NewLine);
                wsList.Add(row.Field<string>("TABLE_NAME"));
            }
            conn.Close();
        }
        catch (Exception ex)
        {
            if (this.mode == Cps2TxtUtilModes.GUI_MODE)
            {
                this.form.appendToResultsTxt(ex.ToString());
            }
            throw;
        }

        return wsList;
    }

I read through the article at this link, but they don't seem to be doing anything differently than I am, and I don't see any filtering out of extra non-worksheet tables, so Microsoft doesn't seem to have provided the right answer.

http://support.microsoft.com/kb/318452

And I've also looked around alot of StackOverflow, like at the thread from the link below which was helpful, but doesn't solve this one problem.

Using Excel OleDb to get sheet names IN SHEET ORDER

Before anyone asks, I'd also like to say that I don't really have control over what features are used in the spreadsheet, so I can't just tell them "Don't turn on filtering" or "Don't use print tables".

Any ideas are much appreciated. Thanks!

4
Just filter out anything that doesn't with "$". I've been using that for a while and it's never failed, even when there are pivot tables, filtered lists, etc.Sid Holland
Thank you; I'll do that. Like I said in my comment below, that's less than ideal because it relies on an implementation detail, but if that's what I've got to do, then so be it.Jim
My answer to the question you have linked in above shows how you could do it using DAO, but if you don't want to use COM then that's not an option. However the addendum to the answer by Esen shows yet another way by renaming the .xlsx to .zip, opening the zip file contents and reading one of the xml files therein for the sheet names. I haven't tested to see if hidden sheets appear in there and it will only work on 2007 (.xlsx) files, but it might be worth a shot.Sid Holland
Good info Sid, but unfortunately for me I have to support Excel 2003, and I definitely don't want to use COM / interop. It's very good to know though that XLSX files can be renamed to zips and opened to get some gems of metadata. Hope that tip helps others; it might help me at some point in the future, but not on this project.Jim
Just FYI to everyone that might look at this for future reference. If a worksheet has a space in the name, the table name in metadata will have single quotes around it, meaning that the $ WILL NOT be the last character in the table name, even if it's a valid worksheet. So just looking for everything that has $ as the last character isn't good enough. You have to first remove any surrounding single qoutes, then evaluate whether $ is the last character. This, to say the least, is lame. Why doesn't Microsoft include something in metadata that signifies a user worksheet?Jim

4 Answers

4
votes

The question is old, but for those who found it now, the skipping can be done as Jim found...

// skip those that do not end correctly
foreach (DataRow row in schemTable.Rows)
{
    string sheetName = row["TABLE_NAME"].ToString();
    if (!sheetName.EndsWith("$") && !sheetName.EndsWith("$'"))
        continue;
    Console.WriteLine(sheetName);
}

That is the wanted are or those that end with $ or those that end with $'.

2
votes

From experience, it seems to be all those whose name ends in a dollar sign. I've come across scenarios from clients where extra worksheets seemed to be appearing which weren't present in the data - these later turned out to be hidden worksheets in Excel!

0
votes

The first way that comes to my mind is the same way akash88 listed in your link to Using Excel OleDb to get sheet names IN SHEET ORDER link.

You can take akash88's approach and clean it up a little so the code is nicer to read.

        var wsList = from s in schemaTable
                     where s.Field<string>("TABLE_NAME").Contains("$")
                     select s.Field<string>("TABLE_NAME");
0
votes

You can test EndsWith("$") instead of Contains("$") like below:

List<String> lstsheetNames = new List<String>();
String sheetName;
foreach (DataRow row in schemaTable.Rows)
{
    sheetName = row.Field<string>("TABLE_NAME");
    String strTemp = sheetName.Split(' ');

    if(strTemp.Length == 1 && sheetName.EndsWith("$"))
       lstsheetNames.Add(sheetName.Substring(0, sheetName.Length - 1));

    else if(strTemp.Length > 1 && strTemp.GetValue(strTemp.Length - 1).ToString().EndsWith("$'"))
       lstsheetNames.Add(sheetName.Substring(1, sheetName.Length - 3));
}

I have used this code in a same problem and it works fine.

Edit : Sorry,I did not pay attention to this.I changed the code now.It might not the best or shortest way but it works.