3
votes

I am trying to to divide each value in columns B and C by the sum due to a factor in column A. The starting matrix could look something like this but has thousands of rows where A is a factor, and B and C contain the values:

A <- c(1,1,2,2)
B <- c(0.2, 0.3, 1, 0.5)
C <- c(0.7, 0.5, 0, 0.9)
M <- data.table(A,B,C) 

> M
     A   B   C
[1,] 1 0.2 0.7
[2,] 1 0.3 0.5
[3,] 2 1.0 0.0
[4,] 2 0.5 0.9 

The factors can occur any number of times. I was able to produce the sum per factor with library data.table:

library(data.table)
M.dt <- data.table(M)
M.sum <- M.dt[, lapply(.SD, sum), by = A]

> M.sum
   A   B   C
1: 1 0.5 1.2
2: 2 1.5 0.9

but didn't know how to go on from here to keep the original format of the table.

The resulting table should look like this:

B.1 <- c(0.4, 0.6, 0.666, 0.333)
C.1 <- c(0.583, 0.416, 0, 1)
M.1 <- cbind(A, B.1, C.1)

> M.1
     A   B.1     C.1
[1,] 1 0.400 0.58333
[2,] 1 0.600 0.41666
[3,] 2 0.666 0.00000
[4,] 2 0.333 1.00000

The calculation for the first value in B.1 would go like this: 0.2/(0.2+0.3) = 0.4 and so on, where the values to add are given by the factor in A.
I have some basic knowledge of R, but despite trying hard, I do badly with matrix manipulations and loops.

2
You might want to see the scale function as well. An awkward way to use it here: M[, as.data.table(scale(.SD, center = FALSE, scale = colSums(.SD))), by=A]Frank

2 Answers

3
votes

Simply divide each value in each column by its sum per each value in A

M[, lapply(.SD, function(x) x/sum(x)), A]
#    A         B         C
# 1: 1 0.4000000 0.5833333
# 2: 1 0.6000000 0.4166667
# 3: 2 0.6666667 0.0000000
# 4: 2 0.3333333 1.0000000

If you want to update by reference do

M[, c("B", "C") := lapply(.SD, function(x) x/sum(x)), A]

Or more generally

M[, names(M)[-1] := lapply(.SD, function(x) x/sum(x)), A]

A bonus solution for the dplyr junkies

library(dplyr)
M %>%
  group_by(A) %>%
  mutate_each(funs(./sum(.)))

# Source: local data table [4 x 3]
# Groups: A
# 
#   A         B         C
# 1 1 0.4000000 0.5833333
# 2 1 0.6000000 0.4166667
# 3 2 0.6666667 0.0000000
# 4 2 0.3333333 1.0000000
0
votes

Like most problems of this type, you can either use data.table or plyr package or some combination of split, apply, combine functions in base R.

For those who prefer the plyr package

library (plyr)
M <- data.table(A,B,C) 

ddply(M, .(A), colwise(function(x) x/sum(x)))

Output is:

  A         B         C
1 1 0.4000000 0.5833333
2 1 0.6000000 0.4166667
3 2 0.6666667 0.0000000
4 2 0.3333333 1.0000000