I have a dataframe, df consisting of both text and numerical features similar to one shown below.
Feature 1 Feature 2 Feature 3 Feature 4 Label
10 20 keyword Human 1
2 3 Keywords Dog 0
8 2 Stackoverflow cat 0
Currently I convert the text features into numerical features using factorize function and then use the new dataframe for classification.
df[' Feature 3'] = df[' Feature 3'].factorize()[0]
df[' Feature 4'] = df[' Feature 4'].factorize()[0]
After running the above code my dataframe looks like this
Feature 1 Feature 2 Feature 3 Feature 4 Label
10 20 0 0 1
2 3 1 1 0
8 2 2 2 0
The factorize function is reading 'keywords' and 'keyword' as different words, so is there any function which will read words similar to 'keywords' and 'keyword' as same words ?
The output dataframe should actually look like this
Feature 1 Feature 2 Feature 3 Feature 4 Label
10 20 0 0 1
2 3 0 1 0
8 2 1 2 0