2
votes

I applied Kmeans algorithm on my dataset with two clusters. My dataset shape is (506,13). How to get the cluster distance from each record?

I have tried using Euclidean distance for the cluster centers but I want to know the distance from each record to both the clusters.

model= KMeans(n_clusters=2)
model.fit(X)
print(model.cluster_centers_)

[3.88774444e-01 1.55826558e+01 8.42089431e+00 7.31707317e-02
5.11847425e-01 6.38800542e+00 6.06322493e+01 4.44127154e+00
4.45528455e+00 3.11926829e+02 1.78092141e+01 3.81042575e+02
1.04174526e+01]
[1.22261690e+01 3.01980663e-14 1.84518248e+01 5.83941606e-02
6.70102190e-01 6.00621168e+00 8.99678832e+01 2.05447007e+00
2.32700730e+01 6.67642336e+02 2.01963504e+01 2.91039051e+02
1.86745255e+01]

**actual results:**
from sklearn.metrics.pairwise import euclidean_distances
dists = euclidean_distances(model.cluster_centers_)
array([[  0.        , 369.34000546],
[369.34000546,   0.        ]])

**Expected results:**

rows cluster_1_distance  cluster_2_distance
 0        0.78                 0.89
 1        0.53                 0.66
1

1 Answers

1
votes

Use cdist function from scipy.spatial.distance library.

As stated in the reference, it takes 2 matrices, and returns distances between each pair of the two matrices. You can use metric argument to specify the type of distance function you want.

In your case,

from scipy.spatial.distance import cdist
dists = cdist(X,model.cluster_centers_,metric='euclidean') #shape of dists : (506,2)