0
votes

I'm dealing with predictors that I'd like to treat as factors. Unfortunately, the data, which represent answers to multiple choice questions, are stored as integers, and so when I fit a linear model, R treats these as numeric predictors rather than factors. I do not want to type out factor(x) every time; how would I automatically code the predictors as factor variables?

Example of data that I might have:

  a b response
1 1 T 6.946486
2 2 F 1.952378
3 3 T 5.189918
4 1 T 2.680438
5 2 F 2.243461
6 3 T 5.398814
7 1 T 2.375182
8 2 F 0.376323
9 3 T 5.144803

Desired task: tell R without having to type out lm(response ~ factor(a) + b) that predictor a should be treated as a factor variable. Maybe I need to iterate through each column and save as a factor, and then pass to lm? Maybe there is something I can pass to lm? Trying different things...

1
data$a <- as.factor(data$a); lm(response ~ a + b, data = data) ?...which is actually more typing and now you've changed your entire data set. Why do you want to do this?Rich Scriven

1 Answers

2
votes

It may be the simplest to convert all answers to multiple choice questions (MCQs) to factors before passing the data frame to lm. Assuming that all integer variables are MSQ answers, you can use is.integer and sapply:

## making up data
N <- 20
d <- data.frame(a = sample(3, N, replace=TRUE),
                b = sample(3, N, replace=TRUE),
                c = sample(3, N, replace=TRUE),
                d = sample(c(TRUE, FALSE), 10, replace=TRUE),
                e = sample(c(TRUE, FALSE), 10, replace=TRUE),
                f = sample(3, N, replace=TRUE),
                response = rnorm(20, 0, 2))

## determine which columns are integer
int_col <- which(sapply(d, is.integer))

## convert all integer variables to factor variables
d[, int_col] <- lapply(d[int_col], factor) # sapply doesn't work here
str(d)

If you have integer variables that are not MSQ answers, then you'll have to modify int_col manually, excluding those variables.