1
votes

I have a data.table, let's say test.dt, with a column called "id". The rows of column "id" refer to the column headers of a second data.table, say counts.dt. I would like to provide the length of each column in counts to match up with the corresponding row id in the original test.dt. For example:

test <- function() {
    library(data.table)
    test.dt <- data.table(id=c("a","b","c"),other=1:3)
    counts.dt <- data.table(a=c(1,NA,NA,NA),b=c(1,1,NA,NA),c=c(1,1,1,1),d=1:4,e=1:4)

    print(counts.dt)
    test.dt<-test.dt[,count:=sum(!is.na(counts.dt[,id]))]
    print(test.dt)
}

Which returns: counts.dt as expected:

    a  b c d e
1:  1  1 1 1 1
2: NA  1 1 2 2
3: NA NA 1 3 3
4: NA NA 1 4 4

However, test.dt seems to count not the number of non-NA elements in the columns of counts.dt, but the length of test.dt resulting in:

   id other count
1:  a     1     3
2:  b     2     3
3:  c     3     3

What I would like, is a table like:

   id other count
1:  a     1     1
2:  b     2     2
3:  c     3     4

Thoughts?

I have tried making my "sum" statement more complex with different eval functions with the same results. I haven't been able to find an answer to this particular question; any help or redirection to a similar question would be greatly appreciated.

UPDATE: My actual data had more longer text strings as IDs, using the answer as shown resulted in the following error:

Error in Math.factor(j) : abs not meaningful for factors

However, I was able to get things moving with the following:

get.length<-function(x){return(as.character(x))}
test.dt<-test.dt[,count:= sum(!is.na(counts.dt[,get.length(id),with=FALSE]),na.rm=TRUE),by=id]
1

1 Answers

2
votes

Try this:

test.dt[, count := sum(counts.dt[, id, with = F], na.rm = T), by = id]

Eventually, data.table may remove the with=FALSE option, in which case Matt's suggestion would still work: counts.dt[[id]]