89
votes

I have a very large dataframe with rows as observations and columns as genetic markers. I would like to create a new column that contains the sum of a select number of columns for each observation using R.

If I have 200 columns and 100 rows, I would like a to create a new column that has 100 rows with the sum of say columns 43 through 167. The columns have either 1 or 0. With the new column that contains the sum of each row, I will be able to sort the individuals who have the most genetic markers.

I feel it is something close to:

data$new=sum(data$[,43:167])
4

4 Answers

118
votes

you can use rowSums

rowSums(data) should give you what you want.

41
votes

The rowSums function (as Greg mentions) will do what you want, but you are mixing subsetting techniques in your answer, do not use "$" when using "[]", your code should look something more like:

data$new <- rowSums( data[,43:167] )

If you want to use a function other than sum, then look at ?apply for applying general functions accross rows or columns.

7
votes

I came here hoping to find a way to get the sum across all columns in a data table and run into issues implementing the above solutions. A way to add a column with the sum across all columns uses the cbind function:

cbind(data, total = rowSums(data))

This method adds a total column to the data and avoids the alignment issue yielded when trying to sum across ALL columns using the above solutions (see the post below for a discussion of this issue).

Adding a new column to matrix error

1
votes

This could also help, however the best option is beyond any doubt the rowSums function:

data$new <- Reduce(function(x, y) {
  x + data[, y]
}, init = data[, 43], 44:167)