What you are asking for can be solved in multiple ways. Here are two:
- First way is to simply define the separating line of you clusters. Since you know how your points should be grouped (by a line) you can use that.
If you want your line to start at the origin, then simply check if x > y:
x<- c(4,4,5,5,6,7,8,9,9,10,2,3,3,4,5,6,6,7,8,8)
y<- c(2,3,3,4,4,5,5,7,6,8,4,5,6,5,7,8,9,9,9,10)
thePoints <- cbind(x,y)
as.integer(thePoints[,1] > thePoints[,2])
[1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
This will put points above the diagonal (starting at 0) in one group, and others - to another group. Keep in mind that if your line may not go through the origin (0) then you have to modify this example a bit.
- Kmeans with correlation distance:
The K-means function:
myKmeans <- function(x, centers, distFun, nItter=10) {
clusterHistory <- vector(nItter, mode="list")
centerHistory <- vector(nItter, mode="list")
for(i in 1:nItter) {
distsToCenters <- distFun(x, centers)
clusters <- apply(distsToCenters, 1, which.min)
centers <- apply(x, 2, tapply, clusters, mean)
# Saving history
clusterHistory[[i]] <- clusters
centerHistory[[i]] <- centers
}
list(clusters=clusterHistory, centers=centerHistory)
}
And correlation distance:
myCor <- function(points1, points2) {
return(1 - ((cor(t(points1), t(points2))+1)/2))
}
theResult <- myKmeans(mat, centers, myCor, 10)
As was also displayed HERE
Here how both solution would look like:
plot(points, col=as.integer(points[,1] > points[,2])+1, main="Using a line", xlab="x", ylab="y")
plot(points, col=theResult$clusters[[10]], main="K-means with correlation clustering", xlab="x", ylab="y")
points(theResult$centers[[10]], col=1:2, cex=3, pch=19)
So it's more about what kind of distance measure you are using and not about some kind of deficiency of K-means.
You can also find better implementations of K-means with correlation distance for R instead of using the one I provided here.