I have one column in a csv which are the names of fruits which I want to convert into an array.
Sample csv column:
Names:
Apple
Banana
Pear
Watermelom
Jackfruit
..
..
..
There are around 400 fruit names in the column
I have used one hot encoding for the same but unable to display the column names(each fruit name from a row of the csv column)
My code till now is:
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
dataset = pd.read_csv('D:/fruits.csv')
X= dataset.iloc[:, 0].values
labelencoder_X = LabelEncoder()
D= labelencoder_X.fit_transform(X)
D = D.reshape(-1, 1)
onehotencoder = OneHotEncoder(sparse=False, categorical_features = [0])
X = onehotencoder.fit_transform(D)
This converts the data of the column into a numpy array but the columns names are coming as [0 1 2 3 .. ..] which I want as each row name of the csv, example [Apple Banana Pear Watermelon .. .. ]
How can I retain the column names after using one hot encoding
.values
changes dataframe to numpy array which doesn't support string column names. You can tryX = pd.DataFrame(X, columns = dataset.columns)
– Sachin Prabhu