0
votes

I am trying to upload a text file into matlab as a matrix and then process based on user input so selected data are selected.

These are the first few rows of the data.

The United States of America, Deaths (1x1)     Last modified: 16-Nov-2012, MPv5 (May07)

Year     Age        Female             Male            Total
1933      0          52615.77         68438.11        121053.88
1933      1           8917.13         10329.16         19246.29
1933      2           4336.92          5140.05          9476.97
1933      3           3161.59          3759.88          6921.47
1933      4           2493.84          2932.59          5426.43
1933      5           2139.87          2537.53          4677.40
1933      6           1939.70          2337.76          4277.46
1933      7           1760.47          2163.90          3924.37
1933      8           1602.20          2015.97          3618.17
1933      9           1464.88          1893.96          3358.84

A larger part of the data is present here: https://www.dropbox.com/s/b4njypwmrxwxzl7/USA.Deaths_1x1.txt?dl=0

The problem I am facing is that everytime I use T=readable() to read in the data, the dimension of T is m x 1 table, rather than a m x 5 table.

I also tried to change the txt file into a csv file, but the data has non-numeric entries.

What could I do to accomplish this problem?

Thanks.

1
look at the doc for importdata then try: datatable=importdata('USA.Deaths_1x1.txt',' ',3) - Hoki
@Hoki do you know why it only imported 111 rows, but not all the 10000+ rows - Bill Li
yeah the import format stumbled at the first hiccup. I made a more robust answer (it imports the whole file) but a few lines remain faulty (for the old people, poor them). - Hoki
I think it only imports 111 row because the 111th data for age is 110+ which is not a numeric. I have a 110+ age for each year so that's about one in every 111 row - Bill Li
Can you show me what you came up with? Thanks - Bill Li

1 Answers

0
votes

For your format of data, most straight forward import functions (importdata, dlmread, etc ...) will fail.

textscan has a few parameters which will allow you to import the full file without breaking at the first irregular line, however a few faulty lines will contain NaN.

%// Define special values which can be encoutered
specialValues = {'110+','other_special_values'} ;
formatSpec = '%n%n%f%f%f' ;

%// Read the file, treating special values 
fileID = fopen('USA.Deaths_1x1.txt');
C = textscan(fileID, formatSpec, ...
    'delimiter'     , ' ', ...
    'headerlines'   ,3, ...
    'treatAsEmpty'  , specialValues, ...
    'MultipleDelimsAsOne',1 );

fclose(fileID);

%// Convert cell array to matrix
data = cell2mat(C) ;

If you really need the faulty lines data, you'll have to write a more custom parser with the low level function fscanf and account for every edge case (unconventionnal line) that you may encounter.