0
votes

I have a .csv file with the first column containing dates, a snippet of which looks like the following:

date,values
03/11/2020,1
03/12/2020,2
3/14/20,3
3/15/20,4
3/16/20,5
04/01/2020,6

I would like to import this data into Matlab (I think the best way would probably be using the readtable() function, see here). My goal is to bring the dates into Matlab as a datetime array. As you can see above, the problem is that the dates in the original .csv file are not consistently formatted. Some of them are in the format mm/dd/yyyy and some of them are mm/dd/yy.

Simply calling data = readtable('myfile.csv') on the .csv file results in the following, which is not correct:

'03/11/2020'    1
'03/12/2020'    2
'03/14/0020'    3
'03/15/0020'    4
'03/16/0020'    5
'04/01/2020'    6

Does anyone know a way to automatically account for this type of data in the import?

Thank you!

My version: Matlab R2017a

EDIT ---------------------------------------

Following the suggestion of Max, I have tried specifiying some of the input options for the read command using the following:

T = readtable('example.csv',...
              'Format','%{dd/MM/yyyy}D %d',...
              'Delimiter', ',',...
              'HeaderLines', 0,...
              'ReadVariableNames', true)

which results in:

    date       values
    __________    ______

    03/11/2020    1     
    03/12/2020    2     
    NaT           3     
    NaT           4     
    NaT           5     
    04/01/2020    6     

and you can see that this is not working either.

2

2 Answers

2
votes

If you are sure all the dates involved do not go back more than 100 years, you can easily apply the pivot method which was in use in the last century (before th 2K bug warned the world of the danger of the method).

They used to code dates in 2 digits only, knowing that 87 actually meant 1987. A user (or a computer) would add the missing years automatically.

In your case, you can read the full table, parse the dates, then it is easy to detect which dates are inconsistent. Identify them, correct them, and you are good to go.

With your example:

a = readtable(tfile) ;                  % read the file
dates = datetime(a.date) ;              % extract first column and convert to [datetime]
idx2change = dates.Year < 2000 ;        % Find which dates where on short format
dates.Year(idx2change) = dates.Year(idx2change) + 2000 ; % Correct truncated years
a.date = dates                          % reinject corrected [datetime] array into the table

yields:

a = 
       date        values
    ___________    ______
    11-Mar-2020    1     
    12-Mar-2020    2     
    14-Mar-2020    3     
    15-Mar-2020    4     
    16-Mar-2020    5     
    01-Apr-2020    6     
1
votes

Instead of specifying the format explicitly (as I also suggested before), one should use the delimiterImportoptions and in the case of a csv-file, use the delimitedTextImportOptions

opts = delimitedTextImportOptions('NumVariables',2,...% how many variables per row?
'VariableNamesLine',1,... % is there a header? If yes, in which line are the variable names?
'DataLines',2,... % in which line does the actual data starts?
'VariableTypes',{'datetime','double'})% as what data types should the variables be read

readtable('myfile.csv',opts)

because the neat little feature recognizes the format of the datetime automatically, as it knows that it must be a datetime-object =)