1
votes

I am trying to process a datasets, count number of occurrence of entries less than a given error rate, plot a bar chart for each dataset(X) with the occurrence(Y). It seems that the output of count is stored in cell type, which is not recognized by bar. How can I store it in an array, instead of cell type?


DATASET_SIZE = 100;
PRUN_MAX_ERROR = 2;
PRUN_MISSING_DATA = -1.000;
ERROR_RATE = 0.2;

for i=1:DATASET_SIZE
   fid = fopen(strcat('log',int2str(i),'.txt'),'r');
   C(i) = textscan(fid, '%.3f');            
   fclose(fid);
end

%% convert cell type to matrix & process data
for i=1:DATASET_SIZE   
   D = cell2mat(C(i));
   %     removing unwanted entries
   D(D == PRUN_MISSING_DATA) = [];      
   D(D > PRUN_MAX_ERROR) = [];       

   %     count number of occurence below certain error rate
   % E = [E sum(D <= ERROR_RATE)];
   E{i} = sum(D <= ERROR_RATE);
end 

figure;
bar(E);

But I get this error:

Undefined function 'real' for input arguments
of type 'cell'.

Error in xychk (line 42)
    x = real(y); y = imag(y);

Error in bar (line 54)
        [msg,x,y] =
        xychk(args{1:nargs},'plot');

Error in checkSeqEffects (line 53)
bar(E); 
2

2 Answers

2
votes

You have quite a few problems here. I'll discuss each of them first, before addressing the real problem.

First, the line

D = cell2mat(C(i));

can be replaced by

D = C{i};

Round braces (()) refer to the cell array index, whereas curly braces ({}) refer to he contents of the cell array at the given cell array index. It's important you learn the difference.

Then, you are growing the cell-array E inside the loop. This means the loop will run slower than it needs to. Just declaring it before the loop

E = cell(DATASET_SIZE,1);
for i=1:DATASET_SIZE
    ...
    E{i} = ...
end

will speed things up.

Then, the name i should be avoided for variables, since it also indicates the complex unit. Same holds for j. Calling loop indices ii or jj will avoid Matlab having to look up whether you mean the complex unit (which is properly written as 1i or 1j in Matlab) or the loop index, which will save a bit of time and most of all, avoid any confusion.

Now, the actual problem: bar(E). Typing help bar in the Matlab command prompt shows you this:

BAR Bar graph.

BAR(X,Y) draws the columns of the M-by-N matrix Y as M groups of N
vertical bars.  The vector X must not have duplicate values.

This tells you that bar() expects an m-by-n matrix, and you're passing it a cell-array. The quickest fix is

bar([E{:}].')

but this will take too long to explain :) The better way to do it is to never make E a cell-array at all (it's not needed):

% convert cell type to matrix & process data
E = zeros(DATASET_SIZE,1);
for ii = 1:DATASET_SIZE   

   D = C{ii};

   % remove unwanted entries
   D(D == PRUN_MISSING_DATA) = [];      
   D(D > PRUN_MAX_ERROR) = [];       

   % count number of occurrences below certain error rate       
   E(ii) = sum(D <= ERROR_RATE);
end 

figure, clf, hold on
bar(E)

Now E is an ordinary array, so bar(E) will work fine.

I suggest you read up on cell arrays, learn when to use them, and more importantly, when not. There's a tonne of questions right here on Stack Overflow (in the 'matlab' tag) that address or involve cell arrays, many of which also discuss their proper (and improper) usage and proper (and improper) use cases.

1
votes

Have you tried switching these two lines?

   E = [E sum(D <= ERROR_RATE)];
   % E{i} = sum(D <= ERROR_RATE);

The first will make a normal array and the second a cell matrix. Probably better to go

E(i) = sum(D <= ERROR_RATE)

i.e. use round brackets instead of curly brackets. Buy that this assumes sum(D <= ERROR_RATE) will have the same dimensions every iteration. Does it? Are you expecting a scalar? you might want to try sum(sum(D <= ERROR_RATE)) if you are (i.e. you a re making a normal 2D bar chart, not a 3D one)