I have a data.frame such as this (the real data set has many more rows and columns)
set.seed(15)
dd <- data.frame(id=letters[1:4], matrix(runif(5*4), nrow=4))
# id X1 X2 X3 X4 X5
# 1 a 0.6021140 0.3670719 0.6872308 0.5090904 0.4474437
# 2 b 0.1950439 0.9888592 0.8314290 0.7066286 0.9646670
# 3 c 0.9664587 0.8151934 0.1046694 0.8623137 0.1411871
# 4 d 0.6509055 0.2539684 0.6461509 0.8417851 0.7767125
I would like to be able to write a dplyr statement where I can select a subset of columns and mutate them. (I'm trying to do something similar to using .SDcols in data.table).
For a simplified example, here's the function I would like to be able to write to add columns for the sums and means of the even "X" columns while preserving all other columns. The desired output using base R is
(cols<-paste0("X", c(2,4)))
# [1] "X2" "X4"
cbind(dd,evensum=rowSums(dd[,cols]),evenmean=rowMeans(dd[,cols]))
# id X1 X2 X3 X4 X5 evensum evenmean
# 1 a 0.6021140 0.3670719 0.6872308 0.5090904 0.4474437 0.8761623 0.4380811
# 2 b 0.1950439 0.9888592 0.8314290 0.7066286 0.9646670 1.6954878 0.8477439
# 3 c 0.9664587 0.8151934 0.1046694 0.8623137 0.1411871 1.6775071 0.8387535
# 4 d 0.6509055 0.2539684 0.6461509 0.8417851 0.7767125 1.0957535 0.5478768
but I wanted to use a dplyr-like chain to do the same thing. In the general case, I'd like to be able to use any of select()
's helper functions such as starts_with
, ends_with
, matches
, etc and any function. Here's what I tried
library(dplyr)
partial_mutate1 <- function(x, colspec, ...) {
select_(x, .dots=list(lazyeval::lazy(colspec))) %>%
transmute_(.dots=lazyeval::lazy_dots(...)) %>%
cbind(x,.)
}
dd %>% partial_mutate1(num_range("X", c(2,4)),
evensum=rowSums(.), evenmean=rowMeans(.))
However, This throws an error that says
Error in rowSums(.) : 'x' must be numeric
Which appears to be because .
seems to be referring to the entire date.frame rather than the selected subset. (same error as rowSums(dd)
). However, note that this produces the desired output
partial_mutate2 <- function(x, colspec) {
select_(x, .dots=list(lazyeval::lazy(colspec))) %>%
transmute(evensum=rowSums(.), evenmean=rowMeans(.)) %>%
cbind(x,.)
}
dd %>% partial_mutate2(seq(2,ncol(dd),2))
I'm guessing this is some sort of environment problem? Any suggestions on how to pass the arguments to partial_mutate1
so that the .
will correctly take values from the "select()-ed" dataset?
dd %>% select(X2, X4) %>% mutate(evensum = rowSums(.), evenmean = rowMeans(.)) %>% select(-X2, -X4) %>% cbind(., dd)
– Steven Beaupré%>%
. In other words, withrowMeans(.)
burried inside.dots
,%>%
has no way of knowing it should be substituting the data there as well. This is just a guess. – BrodieGmutate(dd[,-1], sums=rowSums(.))
doesn't work ("object '.' not found"). So the.
symbol isn't special todplyr
. Trying to use it to apply a function across columns seems to be the wrong idea. I guess I should be reshaping the data to a "tidy" format first. – MrFlickstarts_with
and other select helper functions - the syntax which was suggested by @Brandon Bertelsen now seems to work, i.e.mutate(new_col = rowSums(select(., starts_with(string))))
– tjebo