0
votes

Coming straight from the Machine Learning course on Udemy, the lecture about Encoding categorical data has provided a CSV file to code-along. The content of the file are fairly simple: Screenshot of the data

Creating the matrix of features, I can get the data fairly easily with the values as it is. But on using OneHotEncoder from sklearn (comlumn transformer), the "country column" split into 3 different column gives values as shown below: Screenshot of matrix of features

The lecturer however, gets single decimal places for the same data and same code. Can't seem to understand if I am doing something wrong, or if there is a change in version of the platform that is responsible for this. How can I get the single decimal places instead of multiple zeros.

The code for encoding:

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers =[('encoder',OneHotEncoder(),[0])], remainder = 'passthrough')
X = np.array(ct.fit_transform(X))

Edit:

Expected output for row 1:

[1.0 0.0 0.0 44.0 72000.0]

Output that I get:

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.40000000e+01 7.20000000e+04]
1
Hey, you can just show an example of you want and you get? Im not sure if you want to have vectors that looks like that [1,0,0] or something else? Also, do you want to OneHot encode only the country column? - Green
Hi, @Green Yes, thats exactly what I am hoping to get. The lecturer gets: [1.0 0.0 0.0 44.0 72000.0] for the column that has the data [France 44 72000] - Mursil Khan
I answered the previous thing. this is a bit different. Please edit the question so it would be clear. Just give the exact input and output - Green
Yes, sure! I edited the question with the example - Mursil Khan
It's just a matter of how it gets displayed. 1.0 and 1.0000e+00 are both exactly the same data type and value. - Swier

1 Answers

2
votes

Forget everything I wrote. just put this line if you want to see what happen...

np.set_printoptions(suppress=True) #just to print nicely - you may remove it

I was completely confused by the question...