The function regsubsets
from the leaps
package treats all levels of a categorical(factor) variable as independent dummy variables. I would like to change this behavior.
Example using the iris
dataset, where Species
is a factor variable:
library(leaps)
data(iris)
models <- regsubsets( Sepal.Length~., data = iris, nvmax = 4)
summary(models)
Subset selection object
Call: regsubsets.formula(Sepal.Length ~ ., data = iris, nvmax = 4)
5 Variables (and intercept)
Forced in Forced out
Sepal.Width FALSE FALSE
Petal.Length FALSE FALSE
Petal.Width FALSE FALSE
Speciesversicolor FALSE FALSE
Speciesvirginica FALSE FALSE
1 subsets of each size up to 4
Selection Algorithm: exhaustive
Sepal.Width Petal.Length Petal.Width Speciesversicolor Speciesvirginica
1 ( 1 ) " " "*" " " " " " "
2 ( 1 ) "*" "*" " " " " " "
3 ( 1 ) "*" "*" "*" " " " "
4 ( 1 ) "*" "*" " " "*" "*"
Please notice that regsubsets
created the dummy variables Speciesversicolor
and Speciesvirginica
that now take up two of the four 'spaces' for variables in the fourth row. I would like Species
to just take one space.
Is it possible to change this behavior of the regsubsets
function?
A similar question has been asked before, but most of the comments (and I) agree that the question remains unanswered: https://stats.stackexchange.com/questions/152158/r-model-selection-with-categorical-variables-using-leaps-and-glmnet
Here is another similar and unanswered question: R: can I get regsubsets() to in-/exclude variables by groups?