3
votes

I'm working on a problem on Matlab according to Matrix. I think my code could be improved by remove the for loop. But I really don't know how to fix this one. Can anyone help me, please? the code is:

K = 3;
X = [1 2; 3 4; 5 6; 7 8];
idx = [1;2;3;1];
for i = 1:K
    ids = (idx == i);
    centroids(i,:) = sum(bsxfun(@times, X, ids))./ sum(ids);
end

in this code, data points X is 4x2. There are K=3 centroids, so the centroids is a matrix of 3x2. This code is part of a K-mean function, which is using data points and their closest centroids to find new position of centroids. I want to make the code as something without the FOR loop, maybe beginning like this:

ids = bsxfun(@eq, idx, 1:K);
centroids = ..............
2

2 Answers

4
votes

You could apply accumarray. Note that accumarray only works when X is a column. So, if X has two columns, you can call accumarray twice:

centroids(:,1) = accumarray(idx, X(:,1), [], @mean)
centroids(:,2) = accumarray(idx, X(:,2), [], @mean)

Alternatively, if X contains two columns of real numbers, you can use complex to "pack" the two columns into one complex column, and then unpack the results:

centroids = accumarray(idx, complex(X(:,1),X(:,2)), [], @mean);
centroids = [ real(centroids) imag(centroids)];

If X has an arbitrary number of columns, possibly with complex numbers, you can loop over columns:

centroids = NaN(K, size(X,2)); %// preallocate
for col = 1:size(X,2);
    centroids(:,col) = accumarray(idx, X(:,col), [], @mean);
end
4
votes

You can avoid the bsxfun by using logical indexing, this seems to be a worthwhile performance increase, at least for small matrices X. It is best for small K, and for a small number of rows of X.

K = 3;
X = [1 2; 3 4; 5 6; 7 8];
idx = [1;2;3;1];
centroids=zeros(K,2);
for i = 1:K
    ids = (idx == i);
    centroids(i,:) = sum(X(ids,:),1)./sum(ids);
end

If X has a large number of rows, this method is fastest:

K = 3;
X = [1 2; 3 4; 5 6; 7 8];
idx = [1;2;3;1];
centroids=zeros(K,2);
t=bsxfun(@eq,idx,1:K);
centroids=bsxfun(@rdivide,t.'*X,sum(t).');

And if K is very large, Luis' accumarray method is fastest.