3
votes

I am attempting to create a datatable from an Excel spreadsheet using OpenXML. When getting a row's cell value using Cell.CellValue.innerXml the value returned for a monetary value entered by the user and visible on the spreadsheet is not the same value interpreted.

The spreadsheet cell is formatted as Text and the cell value is 570.81. When obtaining the data in OpenXML the value is interpreted as 570.80999999999995.

This method is used for many different excel imports where the data type for a cell by header or column index is not known when building the table.

I've seen a few post about the Ecma Office Open XML File Formats Standard and mention of numFmtId. Could this be of value?

I assume that since the data type is text and the number has two decimal places that there must be some assumption that the cell has been rounded (even though no formula exists).

I am hopeful someone can offer a solution for properly interpreting the data.

Below is the GetCellValue method:

private static string GetCellValue(SharedStringTablePart stringTablePart, DocumentFormat.OpenXml.Spreadsheet.Cell cell,DocumentFormat.OpenXml.Spreadsheet.Stylesheet styleSheet)
{
    string value = cell.CellValue.InnerXml;

    if (cell.DataType != null && cell.DataType.Value == DocumentFormat.OpenXml.Spreadsheet.CellValues.SharedString)
    {
        return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;  
    }
    else
    {

        if (cell.StyleIndex != null)
        {
            DocumentFormat.OpenXml.Spreadsheet.CellFormat cellFormat = (DocumentFormat.OpenXml.Spreadsheet.CellFormat)styleSheet.CellFormats.ChildElements[(int)cell.StyleIndex.Value];

            int formatId = (int)cellFormat.NumberFormatId.Value;

            if (formatId == 14) //[h]:mm:ss
            {
                DateTime newDate = DateTime.FromOADate(double.Parse(value)); 
                value = newDate.Date.ToString(CultureInfo.InvariantCulture); 
            }
        }
        return value;
    }
}
1

1 Answers

2
votes

As you point out in your question, the format is stored separately from the cell value using number formats in the stylesheet.

You should be able to extend the code you have for formatting dates to include formatting for numbers. Essentially you need to grab the NumberingFormat that corresponds to the cellFormat.NumberFormatId.Value you are already reading. The NumberingFormat can be found in the styleSheet.NumberingFormats elements.

Once you have this you can access the FormatCode property of the NumberingFormat which you can then use to format your data as you see fit.

Unfortunately the format is not quite that straightforward to use. Firstly, according to MSDN here not all formats are written to the file so I guess you will have to have those somewhere accessible and load them depending on the NumberFormatId you have.

Secondly the format of the format string is not compatable with C# so you'll need to do some manipulation. Details of the format layout can be found on MSDN here.

I have knocked together some sample code that handles the currency situation you have in your question but you may need to give some more thought to the parsing of the excel format string into a C# one.

private static string GetCellValue(SharedStringTablePart stringTablePart, DocumentFormat.OpenXml.Spreadsheet.Cell cell, DocumentFormat.OpenXml.Spreadsheet.Stylesheet styleSheet)
{
    string value = cell.CellValue.InnerXml;

    if (cell.DataType != null && cell.DataType.Value == DocumentFormat.OpenXml.Spreadsheet.CellValues.SharedString)
    {
        return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
    }
    else
    {
        if (cell.StyleIndex != null)
        {
            DocumentFormat.OpenXml.Spreadsheet.CellFormat cellFormat = (DocumentFormat.OpenXml.Spreadsheet.CellFormat)styleSheet.CellFormats.ChildElements[(int)cell.StyleIndex.Value];

            int formatId = (int)cellFormat.NumberFormatId.Value;

            if (formatId == 14) //[h]:mm:ss
            {
                DateTime newDate = DateTime.FromOADate(double.Parse(value));
                value = newDate.Date.ToString(CultureInfo.InvariantCulture);
            }
            else
            {
                //find the number format
                NumberingFormat format = styleSheet.NumberingFormats.Elements<NumberingFormat>()
                                .FirstOrDefault(n => n.NumberFormatId == formatId);
                double temp;

                if (format != null 
                    && format.FormatCode.HasValue 
                    && double.TryParse(value, out temp))
                {
                    //we have a format and a value that can be represented as a double

                    string actualFormat = GetActualFormat(format.FormatCode, temp);
                    value = temp.ToString(actualFormat);
                }
            }
        }
        return value;
    }
}

private static string GetActualFormat(StringValue formatCode, double value)
{
    //the format is actually 4 formats split by a semi-colon
    //0 for positive, 1 for negative, 2 for zero (I'm ignoring the 4th format which is for text)
    string[] formatComponents = formatCode.Value.Split(';');

    int elementToUse = value > 0 ? 0 : (value < 0 ? 1 : 2);

    string actualFormat = formatComponents[elementToUse];

    actualFormat = RemoveUnwantedCharacters(actualFormat, '_');
    actualFormat = RemoveUnwantedCharacters(actualFormat, '*');

    //backslashes are an escape character it seems - I'm ignoring them
    return actualFormat.Replace("\"", ""); ;
}

private static string RemoveUnwantedCharacters(string excelFormat, char character)
{
    /*  The _ and * characters are used to control lining up of characters
        they are followed by the character being manipulated so I'm ignoring
        both the _ and * and the character immediately following them.
        Note that this is buggy as I don't check for the preceeding
        backslash escape character which I probably should
        */
    int index = excelFormat.IndexOf(character);
    int occurance = 0;
    while (index != -1)
    {
        //replace the occurance at index using substring
        excelFormat = excelFormat.Substring(0, index) + excelFormat.Substring(index + 2);
        occurance++;
        index = excelFormat.IndexOf(character, index);
    }
    return excelFormat;
}

Given a sheet with the value 570.80999999999995 formatted using currency (in the UK) the output I get is £570.81.