0
votes

I am trying to collapse factor levels, and initially, the count(a7_edu2) output shows that the collapse has worked, but when I check the structure and look in the RStudio view, the change doesn't affect the actual variable.

Any advice for saving as a new variable or overwriting the old one? Thanks!

I have used fct_collapse to collapse into three categories and tried mutate() to create a new variable with the new levels. I have tried saving into a new variable and also transmute() instead of mutate(). I would be satisfied with either a new variable or replacing the old one.

  mutate(a7_edu2 = fct_collapse(a7_edu2,
    Highschool = c("Elm School", "Grade 7 or 8", "Grade 9 to 11", "High School Diploma", "G.E.D"),
    Diploma = c("Diploma or Certificate from trade tech school" , "Diploma or Certificate from community college or CEGEP"),
    Bachelors = c("Bachelor degree", "Degree (Medicine, Dentistry etc)", "Masters degree", "Doctorate")
  )) %>%
  count(a7_edu2) # this is the result I want but when i check the structure, it doesn't save!


str(SCI_dem$a7_edu2)

I expected the output to be 'Factor w/ 4 levels "Highschool", "Diploma", "Bachelors", "other" but instead it gave the original "Factor w/ 13 levels "Elm School","Grade 7 or 8",..: 8 7 6 10 7 7 8 3 7 10 ..."


UPDATED QUESTION: It works to save the one variable to a new df (SCI_collpase). However, when I try save other new collapsed variables to the same dataframe, it overwrites the previous collapses... I have tried specifying new columns SCI_collapse$edu but then it renames the existing variables in the df... How to collapse multiple variables and add them each to a new df? Suggestions for saving or writing a pipe?

SCI_collapse <- SCI_dem %>% 
  mutate(a7_edu2 = fct_collapse(a7_edu2, 
                                Highschool = c("Elm School", 
                                                        "Grade 7 or 8", 
                                                        "Grade 9 to 11", 
                                                        "High School Diploma", 
                                                        "G.E.D"), 
                                Diploma = c("Diploma or Certificate from trade tech school" , 
                                            "Diploma or Certificate from community college or CEGEP"), 
                                Bachelors = c("Bachelor degree", 
                                              "Degree (Medicine, Dentistry etc)", 
                                              "Masters degree", "Doctorate")))
1
The functions in dplyr like mutate return new/updated data frames, they do not update the original data frame in place. Be sure to save the results from the mutate to some variable.MrFlick
Despite my earlier attempts at specifying a new variable, now saving to a new dataset works, thank you! SCI_collapse <- SCI_dem %>% mutate(a7_edu2 = fct_collapse(a7_edu2, Highschool = c("Elm School", "Grade 7 or 8", "Grade 9 to 11", "High School Diploma", "G.E.D"), Diploma = c("Diploma or Certificate from trade tech school" , "Diploma or Certificate from community college or CEGEP"), Bachelors = c("Bachelor degree", "Degree (Medicine, Dentistry etc)", "Masters degree", "Doctorate") ))Cassandra

1 Answers

0
votes

This is what I ended up doing:

# Collapse levels (education) SCI_dem <- SCI_dem %>% mutate(a7_edu2_col = fct_collapse(a7_edu2, # Save as new variable ending in _col Highschool= c("Elm School", "Grade 7 or 8", "Grade 9 to 11", "High School Diploma", "G.E.D"), Diploma = c("Diploma or Certificate from trade tech school" , "Diploma or Certificate from community college or CEGEP"), Bachelors= c("Bachelor degree", "Degree (Medicine, Dentistry etc)", "Masters degree", "Doctorate"), Other = c("Other", "Prefer not to answer") ), a7_edu2_col = droplevels(a7_edu2_col)) %>% # drop empty levels of _col rename(a7_edu2_unc = a7_edu2)

I now have new variables ending in _col and have renamed the old variables to end in _unc (for uncollapsed). Then I clean things up by removing the columns ending in _unc.

SCI_dem <- select(SCI_dem, -ends_with("_unc"))

Which leaves me with my uncluttered, collapsed dataframe :)