Doing pre-processing in Data Mining sometimes involve re-grouping and re-coding categorical variables. It is well known that once you recode categorical variables in R (i.e. function mapvalues
) you need to update your categorical variable with df$variable <- factor(df$variable)
so that you can view the real number of levels in your data.frame with str(df)
.
I have written a piece of code to update automatically the categorical variables of a dataset:
cat <- sapply(df, is.factor) #Select categorical variables
names(df[ ,cat]) #View which are they
A <- function(x) factor(x) #Create function for "apply"
df[ ,cat] <- data.frame(apply(df[ ,cat],2, A)) #Run apply function
str(df) #Check
My question is: how could I select columns whose number of levels is equal to 1, once I have updated my dataset? I have tried these lines without luck:
cat <- sapply(df, is.factor) #Select categorical variables
categorical <- df[,cat] #Create a df named "categorical" separating them
A <- function(x) nlevels(x)==1 #Create "A" function for apply
x <- data.frame(apply(categorical,2, A)) #Run apply function
utils::View(x) #Check and see it is not working...
I appreciate your help and time
indx <- sapply(df[,cat], nlevels)==1; df[,cat][,indx]
– akrunlength(levels())
– drmariod