2
votes

I am running a regression on the nested data with factor variables. If one grouped data has one factor level, the regression fails and throws the error "contrasts can be applied only to factors with 2 or more levels". For eg:

data <- mtcars %>% mutate(am = if_else(carb==1, 1,am),
                          am=as.factor(am))

data_carb <- data %>%
  group_by(carb) %>% 
  nest()

X <- c("cyl", "disp", "hp" , "drat", "wt", "qsec", "vs", "am", "gear")
Y <- "mpg"

generic_model <- function(df) {
  lm(reformulate(X, Y), data = df)
}

modelondata <-  data_carb %>% 
  mutate(model = data %>% map(generic_model),
         coeff  = model %>% map(broom::tidy)) %>% 
  unnest(coeff, .drop = TRUE)

How can I keep the variable as factor and get the output for atleast those grouped data for which the factor levels are more than 1 i.e for carb!=1?

In my real data, I have many factor variables with dozens of levels and the regression fails even if one of the grouped data has constant factor level. So, I don't want to drop the variables as I would lose insights into the other grouped data as well.

1
tryCatch(lm(...)) ?rawr
Can you advise, how I can do that in the code?Geet
tryCatch(lm(reformulate(X, Y), data = df), error = function(e) NULL) everything else the same.. or function(e) NA if you want to keep the grouping levelsrawr
@rawr: It worked. Thanks!Geet

1 Answers

2
votes

What if you created a function to drop columns with "fixed" factors

drop_fixed_factors <- function(x) {
  x %>% keep(~!is.factor(.x) | length(unique(.x))>2)
}

Then did something like this

generic_model <- function(df) {
  good_data <- df[X] %>% drop_fixed_factors()
  lm(reformulate(names(good_data), Y), data = df)
}

Then you can keep only the columns that have variability.