2
votes

I am busy developing a component which imports data from a MS Excel (2016) file. This component uses the MS OpenXML SDK2.5 library. The end-users installation of MS Excel is based on Dutch country / region settings. The file contains, among others, a column with financial data (numeric). The position of this column is not known in advance.

To determine if a cell contains numeric data I evaluate the property Cell.DataType (of type CellValues, which is an enum). At first it seems that this property is the perfect candidate to determine this. Possible values of CellValues are: Boolean, Number, Error, SharedString, String, InlineString or Date. So I would expect that Cell.DataType is set to CellValues.Number. After some debugging I found out that Cell.DataType is null when the cell contains numeric data.

While searching on internet to find an explanation I found the following MSDN article: https://msdn.microsoft.com/en-us/library/office/hh298534.aspx

The article describes exactly what I found during debugging:

The Cell type provides a DataType property that indicates the type of the data within the cell. The value of the DataType property is null for numeric and date types.

Does anybody know why Cell.DataType is not initialized with respectively CellValues.Number or CellValues.Date?

What is the best way to determine if a cell contains a numeric value?

1

1 Answers

3
votes

Does anybody know why Cell.DataType is not initialized with respectively CellValues.Number or CellValues.Date?

Looking at the ECMA-376 standard from here, the (abbreviated) XSD for a Cell looks like this:

<xsd:complexType name="CT_Cell">
    ...
    <xsd:attribute name="t" type="ST_CellType" use="optional" default="n"/>
    ...
</xsd:complexType>

That attribute represents the type. Note that it is optional with a default value of "n". Section 18.18.11 ST_CellType (Cell Type) lists the valid values for the type which are:

b - boolean
d - date
e - error
inlineStr - an inline string
n - number (the default)
s - a shared string str - a formula string

You can see that "n" represents a number.

What is the best way to determine if a cell contains a numeric value?

It would seem from the above that you could check for a null Cell.DataType or a Cell.DataType of CellValues.Number to tell if a cell contains a number but it's not quite that simple - the big problem is dates.

It would seem that the original storage mechanism for dates was to use a number and rely on the style to know whether or not the number is actually a number or if the number represents a date.

Confusingly, the spec has been updated to include the Date type but not all dates will use the date type. The Date type means the cell contains a date in ISO 8601 format but it's perfectly valid for a date to be stored as a number with the correct style. The following XML snippet for example shows the same date (1st Feb 2017) in both Number and Date format:

<sheetData>
    <row r="1" spans="1:1" x14ac:dyDescent="0.25">
        <c r="A1" s="1">
            <v>42767</v>
        </c>
    </row>
    <row r="2" spans="1:1" x14ac:dyDescent="0.25">
        <c r="A2" s="1" t="d">
            <v>2017-02-01</v>
        </c>
    </row>
</sheetData>

Which looks like this when opened in Excel:

The resulting Excel file

If you need to differentiate between dates and numbers then you will need to find any numbers (null Cell.DataType or a Cell.DataType of CellValues.Number) and then check the style of those cells to ensure they are numbers and not dates disguised as numbers.