
My goal is to sum certain columns of my dataframe and input that sum into a new column.

Suppose I have the following data.frame:

df <- data.frame(names=c("a","b","c","d","e","f"),

rownames(df) <- df$names

  wb01 wb02 wb03 wb04 wb05 wb06
a  1    0    0    1    1    1
b  1    0    0    1    0    1
c  0    0    1    0    1    1
d  1    0    1    1    0    1
e  1    1    1    1    0    1
f  0    1    1    1    1    1

I would like to select what columns are to be summed by using a vector that will contain the names of the columns to be sum. (My real dataframe and the number of columns I will be choosing is quite large and not in bunched together, ie/ I can't just choose columns 3-5, nor do I want to type each column since it would be over 2k...)

But back to the example, here are the columns I'd like to sum:

genelist <- c(wb02, wb03, wb06)

So the results would look like this:

  wb01 wb02 wb03 wb04 wb05 wb06 sum_genelist
a  1    0    0    1    1    1         1
b  1    0    0    1    0    1         1
c  0    0    1    0    1    1         2
d  1    0    1    1    0    1         3
e  1    1    1    1    0    1         3
f  0    1    1    1    1    1         3

Thanks for any help or tips!

For the columns mentioned, is there any pattern in the names?akrun
I posted the data used. It is working for meakrun

2 Answers


We can use rowSums

df$sum_genelist <- rowSums(df[intersect(genelist, names(df))], na.rm = TRUE)
#  names wb01 wb02 wb03 wb04 wb05 wb06 sum_genelist
#a     a    1    0    0    1    1    1            1
#b     b    1    0    0    1    0    1            1
#c     c    0    0    1    0    1    1            2
#d     d    1    0    1    1    0    1            2
#e     e    1    1    1    1    0    1            3
#f     f    0    1    1    1    1    1            3


genelist <- c('wb02', 'wb03', 'wb06')


df <- structure(list(names = c("a", "b", "c", "d", "e", "f"), wb01 = c(1, 
1, 0, 1, 1, 0), wb02 = c(0, 0, 0, 0, 1, 1), wb03 = c(0, 0, 1, 
1, 1, 1), wb04 = c(1, 1, 0, 1, 1, 1), wb05 = c(1, 0, 1, 0, 0, 
1), wb06 = c(1, 1, 1, 1, 1, 1)), row.names = c("a", "b", "c", 
"d", "e", "f"), class = "data.frame")

You can use any_of to select only those columns that are present in your data.

genelist <- c('wb02', 'wb03', 'wb06', 'a')
df %>% mutate(sum_genelist = rowSums(select(., any_of(genelist))))

#  names wb01 wb02 wb03 wb04 wb05 wb06 sum_genelist
#1     a    1    0    0    1    1    1            1
#2     b    1    0    0    1    0    1            1
#3     c    0    0    1    0    1    1            2
#4     d    1    0    1    1    0    1            2
#5     e    1    1    1    1    0    1            3
#6     f    0    1    1    1    1    1            3