72
votes

This is probably simple and I feel stupid for asking. I want to change the levels of a factor in a data frame, using mutate. Simple example:

library("dplyr")
dat <- data.frame(x = factor("A"), y = 1)
mutate(dat,levels(x) = "B")

I get:

Error: Unexpected '=' in "mutate(dat,levels(x) ="

Why is this not working? How can I change factor levels with mutate?

6
Perhaps dat %>% mutate(x=factor(x, labels='B')) BTW, the pipe operator is not correct in your code - akrun
So, I can't use levels() in mutate? I need to explicitly encode the variable as a factor again? Hmm... (%<>% should be okay, it pipes and assigns the dat) - user3393472
Yes, I want to mutate x. I thought "levels(x)" would be enough for mutate to figure out that I want to mutate x. I guess that's a design choice, as it works that way with "within". - user3393472
It may be possible using magrittr or other packages, but why do you need to go through this route. It is very easy to do levels(dat$x) <- 'B' - akrun
Wouldn't this do more than change the levels of the factor. It actually changes the values of the factor itself. That seems dangerous and leads to some odd behavior. dat <- data.frame(x = factor(c('A', 'B', 'A')), y = c(1:3)); levels(dat$x) <- c('b', 'a', 'b'); dat - jraab

6 Answers

68
votes

With the forcats package from the tidyverse this is easy, too.

mutate(dat, x = fct_recode(x, "B" = "A"))
42
votes

I'm not quite sure I understand your question properly, but if you want to change the factor levels of cyl with mutate() you could do:

df <- mtcars %>% mutate(cyl = factor(cyl, levels = c(4, 6, 8)))

You would get:

#> str(df$cyl)
# Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
23
votes

Maybe you are looking for this plyr::revalue function:

mutate(dat, x = revalue(x, c("A" = "B")))

You can see plyr::mapvalues too.

21
votes

You can use the recode function from dplyr.

df <- iris %>%
     mutate(Species = recode(Species, setosa = "SETOSA",
         versicolor = "VERSICOLOR",
         virginica = "VIRGINICA"
     )
)
16
votes

Can't comment because I don't have enough reputation points, but recode only works on a vector, so the above code in @Stefano's answer should be

df <- iris %>%
  mutate(Species = recode(Species, 
     setosa = "SETOSA",
     versicolor = "VERSICOLOR",
     virginica = "VIRGINICA")
  )
12
votes

From my understanding, the currently accepted answer only changes the order of the factor levels, not the actual labels (i.e., how the levels of the factor are called). To illustrate the difference between levels and labels, consider the following example:

Turn cyl into factor (specifying levels would not be necessary as they are coded in alphanumeric order):

    mtcars2 <- mtcars %>% mutate(cyl = factor(cyl, levels = c(4, 6, 8))) 
    mtcars2$cyl[1:5]
    #[1] 6 6 4 6 8
    #Levels: 4 6 8

Change the order of levels (but not the labels itself: cyl is still the same column)

    mtcars3 <- mtcars2 %>% mutate(cyl = factor(cyl, levels = c(8, 6, 4))) 
    mtcars3$cyl[1:5]
    #[1] 6 6 4 6 8
    #Levels: 8 6 4
    all(mtcars3$cyl==mtcars2$cyl)
    #[1] TRUE

Assign new labels to cyl The order of the labels was: c(8, 6, 4), hence we specify new labels as follows:

    mtcars4 <- mtcars3 %>% mutate(cyl = factor(cyl, labels = c("new_value_for_8", 
                                                               "new_value_for_6", 
                                                               "new_value_for_4" )))
    mtcars4$cyl[1:5]
    #[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
    #Levels: new_value_for_8 new_value_for_6 new_value_for_4

Note how this column differs from our first columns:

    all(as.character(mtcars4$cyl)!=mtcars3$cyl) 
    #[1] TRUE 
    #Note: TRUE here indicates that all values are unequal because I used != instead of ==
    #as.character() was required as the levels were numeric and thus not comparable to a character vector

More details:

If we were to change the levels of cyl using mtcars2 instead of mtcars3, we would need to specify the labels differently to get the same result. The order of labels for mtcars2 was: c(4, 6, 8), hence we specify new labels as follows

    #change labels of mtcars2 (order used to be: c(4, 6, 8)
    mtcars5 <- mtcars2 %>% mutate(cyl = factor(cyl, labels = c("new_value_for_4", 
                                                               "new_value_for_6", 
                                                               "new_value_for_8" )))

Unlike mtcars3$cyl and mtcars4$cyl, the labels of mtcars4$cyl and mtcars5$cyl are thus identical, even though their levels have a different order.

    mtcars4$cyl[1:5]
    #[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
    #Levels: new_value_for_8 new_value_for_6 new_value_for_4

    mtcars5$cyl[1:5]
    #[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
    #Levels: new_value_for_4 new_value_for_6 new_value_for_8

    all(mtcars4$cyl==mtcars5$cyl)
    #[1] TRUE

    levels(mtcars4$cyl) == levels(mtcars5$cyl)
    #1] FALSE  TRUE FALSE