1
votes

[Edited:] I have a file data2007a.csv and I copied and pasted (using TextEdit in MacBook) the first consecutive few lines to a new file datatest1.csv for testing:

Nomenclature,ReporterISO3,ProductCode,ReporterName,PartnerISO3,PartnerName,Year,TradeFlowName,TradeFlowCode,TradeValue in 1000 USD
S3,ABW,0,Aruba,ANT,Netherlands Antilles,2007,Export,6,448.91
S3,ABW,0,Aruba,ATG,Antigua and Barbuda,2007,Export,6,0.312
S3,ABW,0,Aruba,CHN,China,2007,Export,6,24.715
S3,ABW,0,Aruba,COL,Colombia,2007,Export,6,95.885
S3,ABW,0,Aruba,DOM,Dominican Republic,2007,Export,6,11.432

I wanted to use textscan to read it into MATLAB with only columns 2,3,5 (starting from the second row) and I wrote the following code

clc,clear all
fid = fopen('datatest1.csv');
data = textscan(fid,'%*s %s %d %*s %s %*[^\n]',...
       'Delimiter',',',...
       'HeaderLines',1);
fclose(fid);

But I ended up with only the second row of columns 2,3 and 5:

enter image description here


I then keep the first row in data2007a.csv and selected several others to saved as datatest2.csv:

Nomenclature,ReporterISO3,ProductCode,ReporterName,PartnerISO3,PartnerName,Year,TradeFlowName,TradeFlowCode,TradeValue in 1000 USD
S3,ABW,1,Aruba,USA,United States,2007,Export,6,1.392
S3,ABW,1,Aruba,VEN,Venezuela,2007,Export,6,5633.157
S3,ABW,2,Aruba,ANT,Netherlands Antilles,2007,Export,6,310.734
S3,ABW,2,Aruba,USA,United States,2007,Export,6,342.42
S3,ABW,2,Aruba,VEN,Venezuela,2007,Export,6,63.722
S3,AGO,0,Angola,DEU,Germany,2007,Export,6,105.334
S3,AGO,0,Angola,ESP,Spain,2007,Export,6,8533.125

And I wrote:

clc,clear all
fid = fopen('datatest2.csv');
data = textscan(fid,'%*s %s %d %*s %s %*[^\n]',...
       'Delimiter',',',...
       'HeaderLines',1);
fclose(fid);  
data{1}

It gives exactly what I wanted:
enter image description here enter image description here

When I use the same code for my original data file data2007a.csv, it goes as in the first case.

What is going wrong and how can I fix it?


[Added:] If one replicates my experiments1, one can find that both cases work and the problem does not exist! I really don't know what is going on.

1 For "replicate" I mean copy-and-paste the data given above and save it as two new files, say, datatest4a.csv and datatest4b.csv. I used visdiff('datatest1.csv', 'datatest4a.csv') to compare two files and it returned:

enter image description here

1
Your problem is not reproducible. I doubt that you're using a different file in MATLAB and your actual file is different. In your post, you said that you had a file named data.csv but in MATLAB, you opened datatest.csv. I doubt that this is not just a typo - Sardar Usama
@Sardar_Usama: Thanks for pointing that out. It is indeed a typo. I will edit it right away. - Jack
You missed the main point. Your problem isn't reproducible - Sardar Usama
@Sardar_Usama: Sorry, I don't quite understand your comment. Would you elaborate what you mean by "problem is not reproducible"? - Jack
That it does not reproduce the result that you showed in the screenshot. Rather it produces the result that you're looking for, i.e. it contains all the rows (starting from the second row) of the specified columns - Sardar Usama

1 Answers

1
votes

Given how you fixed it, I think this is an end-of-line character issue. This sometimes comes up when moving text files between Windows and Unix based systems, as they use different conventions.

When you add %*[^\n] to the end of a textscan format, as you have here. it means to skip everything to the end of line. But if it expects a specific end of line character, and can't find one, it will skip everything to the end of the file. This would explain why you get one row correctly read and then nothing else.

If you don't specify what the end of line character is, Matlab appears to default to... something... in this not very clear specification in the help:

The default end-of-line sequence is \n, \r, or \r\n, depending on the contents of your file.

One way to try and cure this without having to create a new file would be to add this 'EndOfLine', '\r\n' to your textscan call:

If you specify '\r\n', then textscan treats any of \r, \n, and the combination of the two (\r\n) as end-of-line characters.

This will hopefully handle most standard(ish) EOL conventions. It is likely that copy-pasting and saving with a different bit of software than was originally used to create the file changed the end of line characters such that Matlab was able to recognise them.