I would like to have a tidyverse solution for the following problem. In my dataset, I have data on various factor levels. I would like to create a new factor level "Total" that is the sum of all values Y at existing factor levels of X. This can be done, for example, with:
mutate(Data, X = fct_collapse(X, Total = c("A", "B", "C", "D"))) %>%
group_by(X) %>%
summarize(Y = sum(Y))
However, this also necessarily overwrites the original factor levels. I would have to combine the original dataset with the new collapsed dataset in an additional step.
One solution I have used in the past to retain the original levels is to bring data in the wide format and proceed with rowwise()
and mutate()
to create a new variable with the "Total" and then reshape back to long.
spread(Data, key = X, value = Y) %>%
rowwise() %>%
mutate(Total = sum(A, B, C, D)) %>%
gather(1:5, key = "X", value = "Y")
However, I am very unhappy with this solution since using rowwise()
is not considered good practice. It would be great if you could point me to an available alternative solution how to combine data under different factor levels while retaining original levels.
Minimal reproducible example:
Data<-data.frame(
X = factor(c("A", "B", "C", "D")),
Y = c(1000, 2000, 3000, 4000))
Expected result:
# A tibble: 5 x 2
X Y
<chr> <dbl>
1 A 1000
2 B 2000
3 C 3000
4 D 4000
5 Total 10000
df %>% janitor::adorn_totals("row")
does this work for you? It requires loading an additional package andTotal
will not be added as a factor. – M--bind_self
to a package I wrote for work: github.com/camille-s/camiller/blob/master/R/bind_self.R It's a little beyond the scope of an SO answer – camilletidyverse
solution? I am asking conceptually, not arguing about what you may prefer, which is, obviously, totally up to you. – M--