4
votes

I have a working ExcelPackage function in an MVC 5 application where I can successfully output a strongly-typed model to a new Excel file.

I have a case where a specific column is nullable DateTime with four possible values:

  • null
  • date only
  • time only
  • date and time

I have extracted a function that does the job of identifying any DateTime values in the Worksheet. Although the DateTime values are consistently identified and formatted correctly, I can't help but think there might be a better way to achieve the same outcome than a nested loop.

Because this function (and the containing class) are designed to accept any IEnumerable, I cannot predict which column -- or even if -- the IEnumerable will contain DateTime values. For that reason, I cannot hard code for a known range of cells.

To provide better context, the class constructor is pasted here, followed by the function that I am trying to make better. Even if you take away the comments, it's still a pretty ugly set of nested code.

In summary, my question is this: am I missing a more elegant or simpler coding approach in terms of the C# language or the Nuget Epplus package?

public class EpplusExcelPackage<T>
{
    private IEnumerable<T> _data;
    private string _reportName;

    public EpplusExcelPackage(IEnumerable<T> Data, string ReportName)
    {
        this._data = Data;
        this._reportName = ReportName;    
    }

    // much more code...

This is the method I hope to make more efficient:

private static void FormatDateTimeValuesInWorksheet(ExcelWorksheet worksheet)
{
    /* correctly format datetime values as:
     *     if date only, format as shortdate 
     *     if time only, format as am/pm time
     *     if date & time present, format as default datetime */

    // the worksheet is data is a strongly-typed model, populated in the model constructor

    System.DateTime dateValue; // used as the out variable of DateTime.TryParse()

    // nested for-loop to find datetime values in worksheet
    for (int i = worksheet.Dimension.Start.Column; i < worksheet.Dimension.End.Column; i++)
    {
        for (int j = worksheet.Dimension.Start.Row; j < worksheet.Dimension.End.Row; j++)
        {
            // ignore null cell values to prevent null exception error
            if (worksheet.Cells[i, j].Value != null)
            {
                // convert the cell value to string: required by TryParse()
                string cellValue = worksheet.Cells[i, j].Value.ToString();

                // identify type of datetime and format accordingly
                if (DateTime.TryParse(cellValue, out dateValue))
                {
                    if (dateValue.Date == Convert.ToDateTime("12/30/1899"))
                    {
                        worksheet.Cells[i, j].Value = dateValue.ToShortTimeString();
                    }
                    else if (dateValue.TimeOfDay.TotalSeconds == 0)
                    {
                        worksheet.Cells[i, j].Value = dateValue.ToShortDateString();
                    }
                    else // do not change
                    {
                        worksheet.Cells[i, j].Value = worksheet.Cells[i, j].Value;
                    }
                }
            }
        }
    }
}
1

1 Answers

9
votes

It comes down to how confident you are that the source excel tables are "properly" formatted. By that I mean are they stored as proper dates (i.e. numeric) or could you have the common excel problem of "numbers stored as string".

If the data is generally clean then you could avoid much of the casting back and forth you are doing to strings and dates by checking their types. Even this is not completely straight forward since Epplus likes to do its own interpretation when importing dates.

Take a look at this table (focus on col A):

enter image description here

Rows 1 - 4 have "properly" formatted data. That means that the dates and time are stored in excel as doubles. Rows 5-8 are "badly" formatted - numbers (and date/time) stored as string. If you run this:

var workbook = pck.Workbook;
var worksheet = workbook.Worksheets.First();
var cells = worksheet.Cells;

foreach (var cell in cells)
    Console.WriteLine($"{{Cell: {cell.Address}, Display: {cell.Text}, Value: {cell.Value}, Type: {cell.Value.GetType()}}}");

You get this in the output:

{Cell: A1, Display: 11:33:00 AM, Value: 0.48125, Type: System.Double}
{Cell: A2, Display: 1/1/2016, Value: 1/1/2016 12:00:00 AM, Type: System.DateTime}
{Cell: A3, Display: 1/1/16 11:33 AM, Value: 42370.48125, Type: System.Double}
{Cell: A4, Display: 1264, Value: 1264, Type: System.Double}
{Cell: A5, Display: 11:33:00 AM, Value: 11:33:00 AM, Type: System.String}
{Cell: A6, Display: 1/1/2016, Value: 1/1/2016, Type: System.String}
{Cell: A7, Display: 1/1/2016  11:33:00 AM, Value: 1/1/2016  11:33:00 AM, Type: System.String}
{Cell: A8, Display: 1264, Value: 1264, Type: System.String}

Since dates and times are just numbers technically (the integer part is the date and the decimal is the time) this gives you a way to convert or separate them. A time double of 0.0 means 00:00:00. Note that row 3 shows as a System.DateTime because, like I said, Epplus just happens to recognize that Excel Style as such but the others are Doubles.

So, you could use a Type check and avoid much of the string conversion and comparison. Again, if you are worried about badly formatted data then your approach is probably as good as any. I would suggest making the value Convert.ToDateTime("12/30/1899")) a constant somewhere before the for loops rather then recreating it every time you increment to save on some cpu cycles.