2
votes

I want to load a csv file in a matrix using matlab.

I used the following code:

formatSpec = ['%*f', repmat('%f',1,20)];

fid = fopen(filename);
X = textscan(fid, formatSpec, 'Delimiter', ',', 'CollectOutput', 1);
fclose(fid);
X = X{1};

The csv file has 1000 rows and 21 columns. However, the matrix X generated has 2000 columns and 20 columns.

I tried using different delimiters like '\t' or '\n', but it doesn't change. When I displayed X, I noticed that it displayed the correct csv file but with extra rows of zeros every 2 rows.

I also tried adding the 'HeaderLines' parameters:

`X = textscan(fid, formatSpec1, 'Delimiter', '\n', 'CollectOutput', 1, 'HeaderLines', 1);`

but this time, the result is an empty matrix.

Am I missing something?

EDIT: @horchler

I could read with no problem the 'test.csv' file. There is no extra comma at the end of each row. I generated my csv file with a python script: I read the rows of another csv file, modified these (selecting some of them and doing arithmetic operations on them) and wrote the new rows on another csv file. In order to do this, I converted each element of the first csv file into floats...

New Edit: Reading the textscan documentation more carefully, I think the problem is that my input file is neither a textfile nor a str, but a file containing floats

EDIT: three lines from the file

0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0,2 1,-0.3834323,-1.92452324171,-1.2453254094,0.43455627857,-0.24571121,0.4340657,1,1,0,0,0,0.3517396202,1,0,0,0.3558122164,0.2936975319,0.4105696144,0,1,0 -0.78676,-1.09767,0.765554578,0.76579043,0.76,1,0,0,323124.235998,1,0,0,0,1,0,0,1,0,0,0,2

2
How was your CSV file created? Do you possibly have a trailing comma at the end of each row? If your write a file with 100 rows and a 21 columns via dlmwrite('test.csv',rand(1e3,21),',') can you read it in as expected? - horchler
@horchler please look at the edited question - bigTree
Upload your file somewhere and paste the link here so that we can give it a try. - Oleg
Try adding %*s to the end of formatSpec and see if that solves the problem. - Mohsen Nosratinia
first and third lines have 21 elements but the second one has 22 elements. I used formatSpec = ['%*f' repmat('%f',1,20) '%*s'] and it worked fine. - Mohsen Nosratinia

2 Answers

2
votes

Using csvread to read a csv file seems a good option. However, I also tend to read csv files with textscan as files are sometimes badly written. Having more options to read them is therefore necessary.

I face a reading problem like yours when I think the file is written a certain way but it is actually written another way. To debug it I use fgetl and print, for each line read, both the output of fgetl and its double version (see the example below). Examining the double version, you may find which character causes a problem.

In your case, I would first look at multiple occurrences of delimiters (',' and '\t') and , in 'textscan', I would activate the option 'MultipleDelimsAsOne' (while turning off 'CollectOutput').

fid = fopen(filename);

tline = fgetl(fid);
while ischar(tline)

    disp(tline);
    double(tline)
    pause;

    tline = fgetl(fid);
end

fclose(fid);
2
votes

How about using regex ?

X=[];
fid = fopen(filename);
while 1
  fl = fgetl(fid);
  if ~ischar(fl),   break,   end
  r =regexp(fl,'([-]*\d+[.]*\d*)','match');
  r=r(1:21); % because your line 2nd is somehow having 22 elements, 
  % all lines must have same # elements or an error will be thrown
  % Error: CAT arguments dimensions are not consistent.
  X=[X;r];
end
fclose(fid);