I want to convert numerical variables into factors when the number of levels is lower than a given threshold with dplyr.
This would be most useful with binary variables coded as numerical '0/1'.
example data:
threshold<-5
data<-data.frame(binary1=rep(c(0,1), 5), binary_2=sample(c(0,1), 10, replace = TRUE), multilevel=sample(c(1:4), 10, replace=TRUE), numerical=1:10)
> data
binary1 binary_2 multilevel numerical
1 0 1 2 1
2 1 0 3 2
3 0 1 2 3
4 1 0 1 4
5 0 1 2 5
6 1 1 4 6
7 0 1 1 7
8 1 1 3 8
9 0 1 1 9
10 1 0 4 10
sapply(data, class)
binary1 binary_2 multilevel numerical
"numeric" "numeric" "integer" "integer"
I could easily transform all variables into factors with mutate(), across() and where(), like this:
data<-data%>%mutate(across(where(is.numeric), as.factor))
> sapply(data, class)
binary1 binary_2 multilevel numerical
"factor" "factor" "factor" "factor"
However, I cant find a way to mutate with multiple conditions, including my threshold argument, for the where() function. I wanted to have this output:
sapply(data, class)
binary1 binary_2 multilevel numerical
"factor" "factor" "factor" "integer"
Tried the following, but failed:
data%>%mutate(across(where(is.numeric & length(unique(.x))<threshold), as.factor))
error message:
Error: Problem with `mutate()` input `..1`.
x object '.x' not found
ℹ Input `..1` is `across(where(!is.factor & length(unique(.x)) < threshold), as.factor)`.
Run `rlang::last_error()` to see where the error occurred.
Maybe I don't understand across() and where() well enough. Suggestions are welcomed.
Additional question: why including a negation operator (!) before is.factor gets me an error when the version without (!) is perfectly fine?
data<-data%>%mutate(across(where(!is.factor), as.factor))
Error: Problem with mutate()
input ..1
.
x invalid argument type
ℹ Input ..1
is across(where(!is.factor), as.factor)
.
Run rlang::last_error()
to see where the error occurred.