2
votes

I know how to conditionally replace levels of a variable using dplyr/tidyr. Here's some toy data (the real dataset is much larger and more complex):

dat <- data.frame(animal=c("cat", "cat", "dog", "cat"),
              size=c("big", "big", "big", "small"))

 newdata <- dat %>% mutate(newanimal=replace(animal, animal=='cat' & size=='big', "fatcat"))

And I keep getting "invalid factor level, NA generated" - why?! These are factor variables, the specific combination of 'cat' and 'big' exists in the dataframe. Why do I get this error?

3
Just do dat %>% mutate(newanimal=replace(animal, animal=='cat' & size=='big', "fatcat")) - Matt
I already did that and I get the same error about invalid factor level. - MeC
One question. You are using filter here. After the substitution, you want the whole table or just the filtered columns? - Marco De Virgilis
run your code, it's not reproducible! as_data_frame doesn't work like this. - M--
Also, as_data_frame is deprecated in favor of as_tibble. But to create a data frame, call tibble or data.frame directly - camille

3 Answers

2
votes

As @camille mentioned, once you have a factor, it's locked in, and if you introduce new "entries", it becomes NA.

For example:

x <- factor(letters[1:3])
x[3] = "d"
Warning message:
In `[<-.factor`(`*tmp*`, 3, value = "d") :
  invalid factor level, NA generated
x
[1] a    b    <NA>
Levels: a b c

The only way to get out of this, is to convert it to character first and replace:

newdata <- dat %>% mutate(newanimal=replace(as.character(animal), animal=='cat' & size=='big', "fatcat"))
newdata
  animal  size newanimal
1    cat   big    fatcat
2    cat   big    fatcat
3    dog   big       dog
4    cat small       cat

Your new column is a character now, but you can always convert it back to a factor, if you need that..

str(newdata)
'data.frame':   4 obs. of  3 variables:
 $ animal   : Factor w/ 2 levels "cat","dog": 1 1 2 1
 $ size     : Factor w/ 2 levels "big","small": 1 1 1 2
 $ newanimal: chr  "fatcat" "fatcat" "dog" "cat"
1
votes

Another option in the tidyverse is to use forcats::fct_expand to add the new level and then pipe this vector into the original replace which will now work as expected. The new variable is a factor and no further conversion is necessary (given that your desired output is a factor).

library(tidyverse)

dat <- dat %>% 
  mutate(newanimal = fct_expand(animal, "fatcat") %>% 
                     replace(., animal == "cat" & size == "big", "fatcat")
         ) 

glimpse(dat)
Observations: 4
Variables: 3
$ animal    <fct> cat, cat, dog, cat
$ size      <fct> big, big, big, small
$ newanimal <fct> fatcat, fatcat, dog, cat

If you use this kind of factor replacement a lot, you could write your own helper function:

replace_fct <- function(x, list, values) {

  .x = forcats::fct_expand(x, unique(values))
  replace(.x, list, values)

}  

And then do:

dat %>% 
  mutate(newanimal = replace_fct(animal, animal == "cat" & size == "big", "fatcat")
  ) 
0
votes

You can try this

library(tidyverse)

dat <- tibble(animal = c("cat","dog","cat","dog","dog","dog"), 
             size =  c("big", "small", "big", "big", "big","big"))

dat %>% mutate(new_animal = ifelse(animal=='cat' & size=='big','fatcat',animal) ) %>% 
  mutate_if(is.character, as.factor)