Example:
load kmeansdata %provides X variable
Y=bsxfun(@minus,X,mean(X,2))'/sqrt(size(X,2)-1); %normalized and means adjusted
[~,~,PC] = svd(Y); %
plot(PC(:,1),PC(:,2),'m.','markersize',15)
plot the first two columns and you will get what looks like 3 clusters. I want to identify these clusters using kmeans, and plot the clusters in different colours as prood. I tried:
[idx,cntrd] = kmeans(PC(:,1:2),3,'Distance','sqEuclidean');%,'Distance','correlation');
cluster=3;
Col = {'.b','.r','.g','.y','.m','.c','.k'}; % Cell array of colours.
figure;
hold on
for clus=1:cluster
plot(PC(idx==clus,1),PC(idx==clus,2),Col{clus},'MarkerSize',12)
end
plot(cntrd(:,1),cntrd(:,2),'kx','MarkerSize',15,'LineWidth',3) %plotting the centroids of the clusters
The cluster centroids are off, and the colours aren't what I expected either. Can anyone help?
EDIT: Somewhat answered:
I copied this code from the mathworks site and replaced my kmeans line:
opts = statset('Display','final');
[idx,C] = kmeans(PC(:,1:2),3,'Distance','cityblock',...
'Replicates',5,'Options',opts);
it seems to work, but I don't quite understand what opts does. Replicates, I assume, just repeats kmeans 5 times, and picks some kind of average for the centroids. I've also restarted matlab in case there was some sort of glitch
EDIT: ignore above:
I thought the problem was resolved, so then I tried looking into finding appropriate k values. I entered k=1, ran everything, then k=2, then k=3 and I noticed I got the same mistake again
gscatter
. And secondly, have you tried using the'Replicates',5
option but sticking with the default Euclidean distance rather than the usingcityblock
? Also try leaving off theopts
part, maybe you don't need it... - Danopts
is doing. It seems like theDisplay
property only affects the console output of your function, i.e. what feedback it gives you. btw I think you are right rereplicate
: mathworks.com/help/stats/kmeans.html#bueftl4-1 - Dan