0
votes
library(dplyr)
library(forcats)

Below is a simple dataframe containing three columns that need to be recoded into three categories - Satisfied, Dissatisfied, Neutral.

Respondent<-c("Respondent1","Respondent2","Respondent3","Respondent4","Respondent5")
Sat1<-c("1 Extremely dissatisfied","2 Moderately dissatisfied","2 Moderately Dissatisfied","4 Neutral","7 Extrmely satified")
Sat2<-c("7 Extremely Satisfied","2. Moderately dissatisfied","4 Neutral","3 Slightly dissatisfied","3 Slightly Dissatisfied")
Sat3<-c("1 Extremely dissatisfied","7 Extremely satisfied","6 Moderately satisfied","4. Neutral","3 Slightly dissatisfied")
Pet<-c("Cat","Cat","Dog","Hamster","Rabbit")

df<-data.frame(Respondent,Sat1,Sat2,Sat3,Pet)

I would like to use dplyr and forcats for the recoding. An example is below.

REC<-df%>%mutate_at(vars(Sat1:Sat3),funs(Rec=fct_collapse(.,
Satisfied=c("7 Extremely satisfied","6 Moderately satisfied","5 Slightly Satisfied"),
Dissatisfied=c("2 Moderately dissatisfied","1 Extremely dissatisfied"),
Neutral="4 Neutral")))

I need a function since I'll be doing this for multiple files. The function has to accommodate multiple variables as inputs and accommodate differences in spelling and punctuation for the different satisfaction categories. For example, "1 Extremely dissatisfied", or "1. Extremely dissatisfied", or "1 Extremely Dissatisfied", etc.

Below is an example function, but I'm not sure how to allow for a non-fixed number of "Var" variables (I would like to use the dots, ..., but had trouble making it work), as well as how to use something like "contains" or "matches" within the fct_collapse function to find all categories containing "Sat" or "sat" for the Satisfied recode, and "Dis" or "dis" for the Dissatisfaction category, and "Neutral" or "neutral" for the Neutral category.

REC<-function(df,Var){    
df%>%mutate_at(vars(Var),funs(Rec=fct_collapse(.,
Satisfied=c("7 Extremely satisfied","6 Moderately satisfied","5 Slightly Satisfied"),
Dissatisfied=c("2 Moderately dissatisfied","1 Extremely dissatisfied"),
Neutral="4 Neutral")))
}

or something like this...

Recode<-function(Df,Var,...){
Df%>%mutate_at(vars(Var),funs(Rec=fct_collapse(.,
Satisfied=c(select(matches("Sat|sat"),
Dissatisfied=c(select(matches("Dis"|"dis"),
Neutral="4 Neutral")))))))
}
1

1 Answers

1
votes

The problem is that using factors, you cannot account for different spellings : there will be different levels for each spelling. You can make a data frame with stringsAsFactors = FALSE, or with data_framewhich doesn't coerce strings to factors to avoid that.

If I understand your question well, you want to recode factors and there will be spelling mistakes in the names. I am going to assume that the first character is a number that gives the expected factor, regardless of what there is afterwards.

Using stringr::str_sub, I will extract that number and use it to map fct_collapse as you wanted. Note : I added a level "3" which does not appear in your mapping between your categories and the levels, and assumed that it was "Satisfied". I also use the dplyr::starts_with to select only columns that you want to change.

df <- data_frame(Respondent,Sat1,Sat2,Sat3,Pet)
library(stringr)

df %>% 
  mutate_at(vars(starts_with("Sat")), 
            funs(fct_collapse(factor(str_sub(., 1, 1), levels = as.character(1:7)),
                              Satisfied = c("7","6","5"),
                              Dissatisfied =c ("3", "2","1"),
                              Neutral = "4")))

Here is the output :

# A tibble: 5 × 5
   Respondent         Sat1         Sat2         Sat3     Pet
        <chr>       <fctr>       <fctr>       <fctr>   <chr>
1 Respondent1 Dissatisfied    Satisfied Dissatisfied     Cat
2 Respondent2 Dissatisfied Dissatisfied    Satisfied     Cat
3 Respondent3 Dissatisfied      Neutral    Satisfied     Dog
4 Respondent4      Neutral Dissatisfied      Neutral Hamster
5 Respondent5    Satisfied Dissatisfied Dissatisfied  Rabbit