7
votes

I would really like pivot_wider to create a column with NAs if the level of a factor exists but never appears in the data when it's used as a names_from argument. For example, the first line gives me a two column tibble, but I'd really like the three column tibble below.

tibble(Person=c("Sarah", "Jackson", "Jackson"), Rank=c(1,1,2), 
       FavoriteAnimal=factor(c("Dog", "Dog", "Cat")))%>%
    group_by(Person)%>%arrange(Rank)%>%slice(1)%>%
    pivot_wider(names_from = FavoriteAnimal, values_from=Rank)

tibble(Person=c("Jackson", "Sarah"), Dog=c(1,1), Cat=c(NA,NA))

How can I get my column of NAs for levels not appearing in my dataset?

3

3 Answers

6
votes

Alternatively, you can first add the missing levels and then do the transformation:

tibble(Person = c("Sarah", "Jackson", "Jackson"), 
       Rank = c(1, 1, 2), 
       FavoriteAnimal = factor(c("Dog", "Dog", "Cat"))) %>%
 group_by(Person) %>%
 arrange(Rank) %>% 
 slice(1) %>%
 complete(FavoriteAnimal = FavoriteAnimal) %>%
 pivot_wider(names_from = FavoriteAnimal, values_from = Rank)

  Person    Cat   Dog
  <chr>   <dbl> <dbl>
1 Jackson    NA     1
2 Sarah      NA     1
4
votes

You can do it with tidyr::spread - spread(key = FavoriteAnimal, value = Rank, drop = FALSE) gives you what you want.

Unfortunately the drop argument seems to have been lost in the transition from spread to pivot_wider.

0
votes

It seems that your slice(1) operation is removing Jackson's Cat ranking. If you remove it from your operation, and add the pivot_wider parameter values_fill = NA you get a 3 column tbl. The only difference between my answer and your goal is that my answer retains Jackson's Cat ranking value.

tibble(Person=c("Sarah", "Jackson", "Jackson"), Rank=c(1,1,2), 
       FavoriteAnimal=factor(c("Dog", "Dog", "Cat")))%>%
    group_by(Person)%>%arrange(Rank)%>%
    pivot_wider(names_from = FavoriteAnimal, values_from=Rank, values_fill = NA)

Based on the help documentation for dplyr::slice, it appears that you are trying to select only the top ranked animal for each person, so this solution doesn't do that. But depending on where you are going with this, there may be other ways such as dplyr::select or dplyr::filter perhaps combined with dplyr::across to accomplish this.