While using R, I am often interested in performing operations on a data.frame in which I summarize a variable by a group, and then want to add those summary values back into the data.frame. This is most easily shown by example:
myDF <- data.frame(A = runif(5), B = c("A", "A", "A", "B", "B"))
myDF$Total <- with(myDF, by(A, B, sum))[myDF$B]
myDF$Proportion <- with(myDF, A / Total)
which produces:
A B Total Proportion
1 0.5272734 A 1.7186369 0.3067975
2 0.5105128 A 1.7186369 0.2970452
3 0.6808507 A 1.7186369 0.3961574
4 0.2892025 B 0.6667133 0.4337734
5 0.3775108 B 0.6667133 0.5662266
This trick -- essentially getting a vector of named values, and "spreading" or "stretching" them across the relevant rows by group -- generally works, although class(myDF$Total)
is "array"
unless I put the by()
inside of a c()
.
I am wondering:
- Is there a commonly-used name for this operation?
- Is there another, less hacky-feeling, and/or faster way of doing this?
- Is there a way to do this with
dplyr
? Maybe there is a Hadley-approved verb operation (like mutate, arrange, etc.) about which I am unaware. I know that it is easy tosummarise()
, but I often need to put those summaries back into the data.frame.