0
votes

I've built a Kmeans clustering in Python upon data imported from a .txt file. I've generated 100 centroids, and there are two figures being ploted with Matplotlib to show those centroids: one figure containg the cloud of points (originated from the .txt file), what represents the data before the clustering, and another figure containing black stars to mark each centroid.

What should I do to plot each centroid in a specific color chosen randomically, instead of the black stars? What means that each centroid group of points would be in a different color.

The code:

import numpy as np
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.cluster import KMeans


#building the 22797x3 array: 


#loading the first array from .txt file, 22797x400 long.
array = np.loadtxt('C:\Scripts/final_array_2.txt', usecols=range(400))  

array = np.float32(array)



#plotting data before the clustering:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

ax.scatter(result[:, 0], result[:, 1], result[:, 2], alpha = 0.1)
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')



# Initializing KMeans, plotting clusters
kmeans = KMeans(n_clusters=100)
# Fitting with inputs
kmeans = kmeans.fit(result)
# Predicting the clusters
labels = kmeans.predict(result)
# Getting the cluster centers
C = kmeans.cluster_centers_
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(result[:, 0], result[:, 1], result[:, 2])
ax.scatter(C[:, 0], C[:, 1], C[:, 2], marker='*', c='#050505', s=1000)

plt.show()

Results of the code above

Data before the clustering: https://i.stack.imgur.com/IXa7R.png

Data after the cluster with the black stars:https://i.stack.imgur.com/1u4JY.png

What I need to get (similar example,not the same cloud of points): https://i.stack.imgur.com/K5oDT.png In this case it would be 100 colors, not just three.

Any help?

1
You can specify the color for each plotted point by c=z, where z is some array of the same shape than C[:,0]. AFAIK, the labels of KMeans are enumerated by the order of the centers, so c=range(len(C[:,1])). Note that this is not random, however you could shuffle the array if you want to.alexblae

1 Answers

2
votes

The issue is that kmeans.cluster_centers_returns the center of each found cluster. What you need is to change the color of each data point by it's label. Using the iris dataset as an example

from sklearn import datasets
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D


iris = datasets.load_iris()
data = iris.data[:,0:3]
x=data[:,0]
y=data[:,1]
z=data[:,2]
kmeans = KMeans(n_clusters=5)
kmeans = kmeans.fit(data)
labels = kmeans.predict(data)

fig=plt.figure()
ax = fig.add_subplot(111, projection='3d')
ColorsA=plt.cm.viridis(np.linspace(0, 1,5),alpha=0.8) #Equally spaced color 
for i in range(5): #Labels of the clusters 
    xL=[]
    yL=[]
    zL=[]
    for k in range(len(x)):
        if labels[k]==i: #Data points of each cluster 
            xL.append(x[k])
            yL.append(y[k])
            zL.append(z[k])

    ax.scatter(xL,yL,zL,c=ColorsA[i])

enter image description here

Hope it helps