I have a large dataset with columns IDNum, Var1, Var2, Var3, Var4, Var5, Var6. The variables are boolean with value either 0 or 1. Each row could be one of 64 different possible permutations. I would like to count the number of rows corresponding to each permutation present. Is there an efficient way to write this in R?
1
votes
2 Answers
2
votes
aggregate
can do this. Here's a shorter example:
r <- function() rbinom(10, 1, .5)
d <- data.frame(IDNum=1:10, Var1=r(), Var2=r())
d
IDNum Var1 Var2
1 1 0 1
2 2 0 1
3 3 0 0
4 4 1 0
5 5 1 1
6 6 0 0
7 7 1 1
8 8 1 0
9 9 0 1
10 10 0 1
Now to count the number of each combination:
> aggregate(d$IDNum, d[-1], FUN=length)
Var1 Var2 x
1 0 0 2
2 1 0 2
3 0 1 4
4 1 1 2
The values in d$IDNum
aren't actually used here, but something must be passed to the length
function. The values in d$IDNum
for each combination are passed to length
to get the count.
1
votes
This will give a slightly different result and will list out all the possibilities regardless of whether they are present or not. Example data:
nam <- c("IDNum",paste0("Var",1:6))
n <- 5
set.seed(23)
dat <- setNames(data.frame(1:n,replicate(6,sample(0:1,n,replace=TRUE))),nam)
# IDNum Var1 Var2 Var3 Var4 Var5 Var6
#1 1 1 0 1 0 1 1
#2 2 0 1 1 1 0 1
#3 3 0 1 0 1 0 1
#4 4 1 1 0 1 1 0
#5 5 1 1 1 1 0 1
Count em up:
data.frame(table(dat[-1]))
# Var1 Var2 Var3 Var4 Var5 Var6 Freq
#1 0 0 0 0 0 0 0
#...
#28 1 1 0 1 1 0 1
#...
#43 0 1 0 1 0 1 1
#...
#47 0 1 1 1 0 1 1
#48 1 1 1 1 0 1 1
#...
#54 1 0 1 0 1 1 1
#...
#64 1 1 1 1 1 1 0