0
votes

I would like to compare the following given data set

a = '235 148 89 19 222';
b = '112 128 144 160 176';
c = '192 192 192 192 192';
d = '64 64 64 64 64';

with

y = [230 138 79 15 212];

Then calculate the correlation coefficient by comparing each of the given data set with y. Then, display the string with the highest correlation coefficient found. I can find it for calculating it for two values with the command

c = corrcoef( a, y );   
c = abs(c(2,1)); 

but how do I iterate through each data set using a for loop and display the result with highest the corrcoef?

Here is the piece of code which I have written, but I don't know how to proceed with the 'for loop'

a = '235 148 89 19 222';
b = '112 128 144 160 176';
c = '192 192 192 192 192';
d = '64 64 64 64 64';

y = '230 138 79 15 212';

s = {a;b;c;d};
s = cellfun(@strsplit, s, 'UniformOutput', false);
s = vertcat(s{:});

for i = 1:size(s,1)

end
2

2 Answers

0
votes

First of all the easy way to transform a string of number to array is using str2num like this:

>> an = str2num(a)

an =

   235   148    89    19   222

to concatenate stings in a matrix as rows use char and then convert it to a matrix:

>> S = char(a,b,c,d)

S =

235 148 89 19 222  
112 128 144 160 176
192 192 192 192 192
64 64 64 64 64     

>> N = str2num(S)

N =

   235   148    89    19   222
   112   128   144   160   176
   192   192   192   192   192
    64    64    64    64    64

then the only thing you need is to go trought the matrix:

>> [rows,columns] = size(N)

rows =

     4


columns =

     5

we need to iterate over all the rows

>> N(1,:)

ans =

   235   148    89    19   222

in the matlab help:

R = corrcoef(A,B) returns coefficients between two random variables A and B.

>> R = corrcoef(N(1,:),y)

R =

    1.0000    0.9995
    0.9995    1.0000

so then applying your measure to the loop

>> for i = 1:rows
R = corrcoef(N(i,:),y);
rr(i) = abs(R(2,1));
end
>> rr

rr =

    0.9995    0.2789       NaN       NaN

finally the max of that vector is the row that you want

>> [value,position] = max(rr)

value =

    0.9995


position =

     1

>> N

N =

   235   148    89    19   222
   112   128   144   160   176
   192   192   192   192   192
    64    64    64    64    64

>> N(position,:)

ans =

   235   148    89    19   222
0
votes

Is there a special reason for using strings and cells instead of integer values and matrices?

What about the following solution:

a = [235 148 89 19 222;...
    112 128 144 160 176;...
    192 192 192 192 192;...
    64 64 64 64 64];

b = zeros(size(a));

y = [230 138 79 15 212];

for i=1:length(a(:,1))
    b(i,:) = a(i,:)- y;
end;

[~, minLine] = min(sum(abs(b')));

disp(minLine);

Here every line in the given dataset a(matrix) is compared with the given vector y, by calculating the difference and storing the values in a second matrix b. After the loop the minimal sum is calculated and gives you the line in a, which is correlating most to y