I need to create a Kmeans algorithm for zoo.csv -data from https://archive.ics.uci.edu/ml/datasets/Zoo, which finds out suitable number of clusters (using elbow method)in certain parts of the code and also tests a given number of clusters (n_clusters). But the problem is that the values of anim_name column in the csv-file are string values (aardvark, antelope, etc.) and when I run this code, I get this error message that says: "ValueError: could not convert string to float: 'aardvark'". How could I convert the values of anim_name column into float (or int), so that I could make this algorithm work? I have tried different methods but nothing works so far.
Here is my code so far (I am doing this in Google Colab):
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn import metrics
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine
from sklearn.cluster import KMeans
%matplotlib inline
from sklearn.preprocessing import LabelEncoder
from sklearn import preprocessing
from google.colab import drive
drive.mount('/content/drive')
data=pd.read_csv('/content/drive/MyDrive/MyFiles/zoo[1].csv', delimiter=',')
data.head()
kmeans=KMeans(n_clusters=2,max_iter=300)
kmeans.fit(data)
y_km=kmeans.predict(data)
clusters=kmeans.labels_
data['clusters']=clusters
data
After the previous part I get this error message:"/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 81 82 """ ---> 83 return array(a, dtype, copy=False, order=order) 84 85
ValueError: could not convert string to float: 'aardvark'"
res1=np.round(data.groupby('clusters').mean(),2)
pd.DataFrame(res1)
scores = []
for i in range(1, 11):
kmeans = KMeans(n_clusters=i)
kmeans.fit(data)
scores.append(kmeans.inertia_)
plt.plot(range(1, 11), scores)
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('Scores')
plt.show()