0
votes

Creating a new column using mutate which is some function of the contents of a specified set of columns for each row in a data frame.

This seems like it should be a simple task but I've been struggling to find the right syntax something like:

df <- data.frame("annotations"=c("some","information","in","columns"),
           "X001"=c(124,435,324,123),
           "X002"=c(486,375,156,375)) 

df %>% mutate(median=median(select(.,starts_with("X"))))

So I get the original data frame with a new column 'median' which has the median across all columns starting with 'X' for each row. I think I might need a rowwise() in there somewhere.

I'm trying to fit this into a larger dplyr pipeline so I'm looking for solutions within the 'tidyverse'

2

2 Answers

1
votes

You can pmap over the X columns

library(tidyverse)
df %>% 
  mutate(median = pmap_dbl(select(., starts_with("X"))
                           , ~median(c(...))))

Or use apply

df %>% 
  mutate(median = apply(select(., starts_with("X")), 1, median))
0
votes

Another way which doesn't include the use of dplyr

library(data.table)
# columns starts with X
df[,names(df) %like% "X"]

# output
    X001 X002
 1  124  486
 2  435  375
 3  324  156
 4  123  375


# get the median for each row using apply function
apply(df[,names(df) %like% "X"], 1, median)
#output - median of each row
305 405 240 249 

# store the results in a new column
df$median = apply(df[,names(df) %like% "X"],1,median)

# output
annotations X001 X002 median
1        some  124  486    305
2 information  435  375    405
3          in  324  156    240
4     columns  123  375    249