2
votes

In sklearn 0.20.3 documentation, https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html init has parameter drop, but when I use the same it's throwing a type error.

I didn't find any examples using the "drop" Keyword, most of the examples I have seen are using the older version of sklearn. And in some cases, they used ColumnTransfer (even that's for the older version of OnehotEncoder as it gives Future Warning)

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])

onehotencoder = OneHotEncoder(categories = [0],handle_unknown='ignore',drop=[0])

Expected results: should be able to compile the above code. Actual results: TypeError (init() got an unexpected keyword argument 'drop')

3

3 Answers

0
votes

Try with this:

onehotencoder = OneHotEncoder(categories = [0],handle_unknown='ignore',drop[0])

Possible explanation from docs:
None : retain all features (the default).
‘first’ : drop the first category in each feature. If only one category is present, the feature will be dropped entirely.
array : drop[i] is the category in feature X[:, i] that should be dropped.




# Own implementation of One Hot Encoding - Data Transformation
def convert_to_binary(df, column_to_convert):
    categories = list(df[column_to_convert].drop_duplicates())

    for category in categories:
        cat_name = str(category).replace(" ", "_").replace("(", "").replace(")", "").replace("/", "_").replace("-", "").lower()
        col_name = column_to_convert[:5] + '_' + cat_name[:10]
        df[col_name] = 0
        df.loc[(df[column_to_convert] == category), col_name] = 1

    return df

# One Hot Encoding
print("One Hot Encoding categorical data...")
columns_to_convert = [col1,col2]#Enter your column names here that you want to one hot encode.

for column in df_all.columns:              #columns_to_convert
    if df_all.column.dtype == 'category':
        df_all = convert_to_binary(df=df_all, column_to_convert=column)
        df_all.drop(column, axis=1, inplace=True)
print("One Hot Encoding categorical data...completed")

Make sure you enter your list of columns (if you dont want all categorical variables to be converted) in the columns_to_convert

0
votes

the link you provide is not for the 0.20 version, but the latest. Check version 0.20.3: https://scikit-learn.org/0.20/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder On that version, the argument is not documented. So a solution to the problem would be updating tot he latest version of SKlearn.

0
votes

Upgrading your current version of scikit-learn should resolve the issue:

python -m pip install --user --upgrade scikit-learn