I would like to use aggregate to calculate simple summary metrics over multiple columns, based on certain conditions, e.g. only for those rows that are > 0. subsetting the data for values > 0 prior to using aggregate obviously won't work, as that will remove the entire row for all columns, even if only one zero occurs. See the following code for illustration:
idA <- c("A","A","A","A","A","B","B","B","B","B")
idB <- c("C","D","C","D","C","D","C","D","C","D")
colA <- c(0,2,3,0,0,3,9,5,6,1)
colB <- c(9,3,0,2,2,4,6,1,9,9)
colC <- c(0,0,5,7,3,9,8,1,2,3)
df <- data.frame(idA,idB,colA,colB,colC)
aggregate(.~idA+idB,df,FUN=NROW)
The aggregate command in this form, of course, makes no sense, as all the columns have an equal number of rows.
This is the result I'm looking for:
idA idB colA colB colC
A C 1 2 2
B C 2 2 2
A D 1 2 1
B D 3 3 3
Hence the need for a conditional statement, including only rows > 0. Alternatively, I'm sure there is a smart way of doing this via data.table. Any help would be much appreciated!
aggregate(.~ idA + idB, df , function(x) length(x[x > 0]))- David Arenburg