For a research paper, I will be using a lasso model to perform classification and feature selection. I am preparing to use one-hot encoding to process my categorical data and will need to figure out which feature maps to the original categorical values in order to determine which features were ultimately selected for the final model. I've been googling this question for a while but have not found an answer.
How does scikit's one-hot encoding assign values? For example, say my categorical values for a certain variable are {1, 2, 3, 4}. Does one-hot encoding organize them into dummies in chronological order (i.e. drops 1, makes the first dummy for value 2, second dummy for value 3, and third dummy for value 4? Or does it assign based on the order in which it finds different categorical values as it scans down the rows (e.g. the first observation has a value 3 and the second observation has value 2, so 3 is dropped and the first dummy becomes value 2)?
Thanks!