(For the following, I could either an R data.frame or R data.table. Both are ok.)
I have the following data.table:
library(data.table)
dt = data.table(V1=c("dog", "dog", "cat", "cat", "cat", "bird","bird","bird","bird"),
V2=rep(42, 9), V3=c(1, 2, 4, 5, 7, 1, 2, 5, 8))
> print(dt)
V1 V2 V3
1: dog 42 1
2: dog 42 2
3: cat 42 4
4: cat 42 5
5: cat 42 7
6: bird 42 1
7: bird 42 2
8: bird 42 5
9: bird 42 8
Column V3
contains integers from 1 to 8. My goal is to populate an 8 by 8 zero matrix with the count of each combination "pair" given the unique category in column V1
So, the combination pairs for dog
, cat
, and bird
are:
dog: (1, 2)
cat: (4, 5), (4, 7), (5, 7)
bird: (1, 2), (1, 5), (1, 8), (2, 5), (2, 8), (5, 8)
For each pair, I add +1
to the corresponding entry in the zero matrix. For this matrix, (n, m) = (m, n)
. The matrix given dt
would be:
1 2 3 4 5 6 7 8
1: 0 2 0 0 1 0 0 1
2: 2 0 0 0 1 0 0 1
3: 0 0 0 0 0 0 0 0
4: 0 0 0 0 1 0 1 0
5: 1 1 0 1 0 0 1 1
6: 0 0 0 0 0 0 0 0
7: 0 0 0 1 1 0 0 0
8: 1 1 0 0 1 0 0 0
Note that (1,2)=(2,1)
has a count 2, from the dog
combination and the bird
combination.
(1) Is there a method to calculate the combinations of values in an R data.table/data.frame column, given the unique value in another column?
Perhaps it would make sense to output an R list, with vector "pairs", e.g.
list(c(1, 2), c(2, 1), c(4, 5), c(4, 7), c(5, 7), c(5, 4), c(7, 4), c(7, 5),
c(1, 2), c(1, 5), c(1, 8), c(2, 5), c(2, 8), c(5, 8), c(2, 1), c(5, 1),
c(8, 1), c(5, 2), c(8, 2), c(8, 5))
However, I'm not sure how I would use this to populate a matrix...
(2) Given the input data.table/data.frame, what would be the most efficient data-structure to use to write out a matrix, as soon above?