2
votes

I'm wondering whether there is an easy way to use dplyr's variable selection functions in my own custom functions. By dplyr's variable selection functions, I mean these: https://github.com/hadley/dplyr/blob/master/R/select-utils.R

Or, if you're familiar with dplyr, things like "contains", "one_of", "starts_with", etc.

What I'd like to be able to do is write a function that only operates on certain variables:

# note: pseudo code
foo = function(df, vars){
  for (var in vars){
     df$var = as.character(df$var)
  }
}

I'm aware of dplyr's "mutate_each" function, which allows me to do this, but I have to write a function that operates on a vector instead of writing a function that operates on a data.frame.

The purpose of my question is to be able to more cleanly add a custom function to a data processing pipeline. For example, I want to ultimately do this:

df %>%
  foo(starts_with("varname"))

Rather than

df %>%
  mutate_each(funs(foo), starts_with("varname"))

I hope this makes sense. Thanks!

1
If I understand correctly, it means that you would need to write a customized function for each function you want to use in that way.. one for sum, one for mean, one for.. etc. Is that what you want?talat
Correct -- each function would do one thingdreww2
And do you want the result of df %>% foo(starts_with("varname")) to be exactly identical to the result of df %>% mutate_each(funs(foo), starts_with("varname")) (meaning it would result in a data.frame)?talat
Yes, I'm looking for the return object to be a data.framedreww2
Part of what is so great about dplyr is that it standardizes how we expression data transformation processes. This makes for explicit, readable, although occasionally more verbose code. The mutate_each approach is perfectly clear about what it's doing. Your desired replacement is not.Matthew Plourde

1 Answers

2
votes

You want

df %>%
  foo(starts_with("varname"))

It can be solved with

df %>% select(starts_with("varname")) %>% foo

If you really want one single function:

select_and_foo <- function(df,varname) {
    df %>% select(starts_with(varname)) %>% foo %>% return
}

Then

df %>% select_and_foo("varname")

Example

# create sample data
set.seed(16)
sampledf <- matrix(rnorm(50), ncol = 10) %>% as.data.frame() %>% set_names(paste0(c(rep("H",5),rep("O",5)),1:10))

> sampledf
          H1          H2         H3         H4         H5        O6          O7         O8         O9         O10
1  0.4764134 -0.46841204  1.8471821 -1.6630805 -1.6477976 1.5274670 -0.67252558  0.2805551 -1.3253531  0.28390672
2 -0.1253800 -1.00595059  0.1119334  0.5759095 -0.3141739 1.0541781  0.13259853  0.5447834  2.0651357  0.12157699
3  1.0962162  0.06356268 -0.7460373  0.4727601 -0.1826816 1.0300710 -0.07092735  0.1308698  0.2421730  0.56634411
4 -1.4442290  1.02497260  1.6582137 -0.5427317  1.4704785 0.8401609 -0.94269547  0.2818444 -0.3490972  0.56903290
5  1.1478293  0.57314202  0.7217206  1.1276871 -0.8658988 0.2169647 -1.02203100 -0.2927308 -0.6308124 -0.09058676


# define a function that operates on dataframes and returns a dataframe
foo = . %>% solve %>% t %>% as.data.frame

# et voila
> sampledf %>% select(starts_with("H")) %>% foo
          H1         H2        H3          H4          H5
1  0.3004043 -0.2011219 0.2233852 -0.28712106  0.07735365
2  0.3557121 -0.9996712 0.4830006  0.30975840  0.61582865
3  1.7020360 -0.8657610 0.4686327 -0.35050079  1.61728993
4  0.4993172 -0.2886113 0.4811486 -0.04590702  0.81210612
5 -0.2118682  0.4379735 0.1178755  0.42998574 -0.48759158