Visualizing clusters result using PCA (Python)

Question

I have a dataset containing 61 rows(users) and 26 columns, on which I apply clustering with k-means and others algorithms. first applied KMeans on the dataset after normalizing it. As a prior task I run k-means on this data after normalizing it and identified 10 clusters. In parallel I also tried to visualize these clusters that's why i use PCA to reduce the number of my features.

I have written the following code:

UserID  Communication_dur   Lifestyle_dur   Music & Audio_dur   Others_dur  Personnalisation_dur    Phone_and_SMS_dur   Photography_dur Productivity_dur    Social_Media_dur    System_tools_dur    ... Music & Audio_Freq  Others_Freq Personnalisation_Freq   Phone_and_SMS_Freq  Photography_Freq    Productivity_Freq   Social_Media_Freq   System_tools_Freq   Video players & Editors_Freq    Weather_Freq
1   63  219 9   10  99  42  36  30  76  20  ... 2   1   11  5   3   3   9   1   4   8
2   9   0   0   6   78  0   32  4   15  3   ... 0   2   4   0   2   1   2   1   0   0


from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA 

Sc = StandardScaler()
X = Sc.fit_transform(df)
pca = PCA(3) 
pca.fit(X) 
pca_data = pd.DataFrame(pca.transform(X)) 
print(pca_data.head())

gives the following results:

I want to show a plot (cluster) of my dataset by using a PCA and interpret the results ? I am really new in this space and advice would be greatly appreciated!

Thanks in advance once again.

You want them 3D or 2D? 2D would be easier, but now you have 3D. — Frightera
Does this answer your question? How to plot clusters in python? — Flavia Giammarino

StupidWolf StupidWolf · Accepted Answer · 2021-02-15T09:54:46

Using an example dataset:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA 
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

df, y = make_blobs(n_samples=70, centers=10,n_features=26,random_state=999,cluster_std=1)

Perform scaling, PCA and put the PC scores into a dataframe:

Sc = StandardScaler()
X = Sc.fit_transform(df)
pca = PCA(2) 
pca_data = pd.DataFrame(pca.fit_transform(X),columns=['PC1','PC2'])

Perform kmeans and place the label into a data frame and you can already plot it using seaborn:

kmeans =KMeans(n_clusters=10).fit(X)
pca_data['cluster'] = pd.Categorical(kmeans.labels_)
sns.scatterplot(x="PC1",y="PC2",hue="cluster",data=pca_data)

Or matplotlib:

fig,ax = plt.subplots()
scatter = ax.scatter(pca_data['PC1'], pca_data['PC2'],c=pca_data['cluster'],cmap='Set3',alpha=0.7)
legend1 = ax.legend(*scatter.legend_elements(),
                    loc="upper left", title="")
ax.add_artist(legend1)

Visualizing clusters result using PCA (Python)

1 Answers