0
votes

I was wondering if there's any way to transform my categorical columns after one-hot encoding to have a value from another column, instead of a binary "1" in one of the categories which is present?

My dataframe looks like this:

ID  Location    Amount  Quantity
1   TEXAS       12342   1
2   CALIFORNIA  23423   4

After label and one-hot encoding, I get this:

ID  Location_TEXAS  Location_CALIFORNIA    Amount   Quantity
1   1                  0                   12342    1
2   0                  1                   23423    4

Is it possible to have the Amount in the encoded columns instead of the binary values?

Desired result:

ID  Location_TEXAS  Location_CALIFORNIA    Amount   Quantity
1   12342                  0                12342   1
2   0                      23423            23423   4

After that, I can drop the Amount column entirely.

This is the code I used for label encoding and one-hot encoding:

 from sklearn.preprocessing import LabelEncoder, OneHotEncoder 
 labelencoder_X = LabelEncoder()
 X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
 onehotencoder = OneHotEncoder(categorical_features = [0])
 X = onehotencoder.fit_transform(X).toarray()

Please let me know if this is possible. Any help would be appreciated.

1

1 Answers

1
votes

This would defeat the purpose of one-hot encoding your state variable. The idea of OHE is that for each observation, only one of the encoded features is "hot" for each.

Additionally, this would make your state and amount features linearly inseparable, which would not allow the model to vary the weights of amount independently of the weights for location. Without a very specific reason for doing this, I would say it's not a good idea.