0
votes

Hi guys I have a question regarding an matrix operation in R. I have a data set like the one bellow:

Sample Data:

d <- data.frame(id=c(2,3,4,5,6,6,8,11,11,11,12,12,12),author=c("FN","VM","VA","FK","VM","SM","FK","FK","VB","VA","FK","VB","VA"))
d
   id author
1   2     FN
2   3     VM
3   4     VA
4   5     FK
5   6     VM
6   6     SM
7   8     FK
8  11     FK
9  11     VB
10 11     VA
11 12     FK
12 12     VB
13 12     VA

1)Created an Incidence Matrix:

> m <- xtabs(~author+id,d)
> m
      id
author 2 3 4 5 6 8 11 12
    FK 0 0 0 1 0 1  1  1
    FN 1 0 0 0 0 0  0  0
    SM 0 0 0 0 1 0  0  0
    VA 0 0 1 0 0 0  1  1
    VB 0 0 0 0 0 0  1  1
    VM 0 1 0 0 1 0  0  0

What I want to do is to generate pair combinations from the author list, in column 2, by multiplying each row. For instance for the pair FK-VA, its corresponding rows in the incidence matrix are this:

FK 0 0 0 1 0 1  1  1
VA 0 0 1 0 0 0  1  1

The expected outcome in my matrix should produce a multiplication by each element of the rows:

FK-VA (0*0),(0*0),(0*1),(1*0),(0*0),(1*0),(1*1),(1*1)
FK-VA   0   0   0   0   0   1   1

2)Expected outcome would be this matrix(m):

FK  FN  0   0   0   0   0   0   0
FK  SM  0   0   0   0   0   0   0
FK  VA  0   0   0   0   0   1   1
FK  VB  0   0   0   0   0   1   1
FK  VM  0   0   0   0   0   0   0
FN  SM  0   0   0   0   0   0   0
FN  VA  0   0   0   0   0   0   0
FN  VB  0   0   0   0   0   0   0
FN  VM  0   0   0   0   0   0   0
SM  VA  0   0   0   0   0   0   0
SM  VB  0   0   0   0   0   0   0
SM  VM  0   0   0   1   0   0   0
VA  VB  0   0   0   0   0   1   1
VA  VM  0   0   0   0   0   0   0
VB  VM  0   0   0   0   0   0   0

3) Delete Empty rows.

As you can see I need help for the steps 2 and 3.

Thank you

Mario

1
Are you sure the expected matrix is correct?akrun
I did it manually in excel, possible that contain some mistakes, but the idea as I explained, is create pairs by the "author" column, an multiply the rows. Generate a new matrix with this pairs. At least the dimensions of the example matrix are right.Mario GS
I got a similar matrix but the values were not the same as your expected one.akrun
can you explain me the procedure?Mario GS
I posted a solution, but I am not sure how you want to merge the id when the pairwise author list is created. I used one column as reference.akrun

1 Answers

0
votes

May be this helps

indx <- combn(dimnames(m)$author,2)
res <- cbind(t(indx), as.data.frame(do.call(rbind,
       lapply(split(indx, col(indx)), function(x) m[x[1],]*m[x[2],]))))

colnames(res)[1:2] <- paste0('author', 1:2)
head(res,3)
#  author1 author2 2 3 4 5 6 8 11 12
#1      FK      FN 0 0 0 0 0 0  0  0
#2      FK      SM 0 0 0 0 0 0  0  0
#3      FK      VA 0 0 0 0 0 0  1  1

Or

  cbind(t(indx),as.data.frame(t(combn(dimnames(m)$author,2,
                           FUN=function(x) m[x[1],] * m[x[2],]))))

If you want to subset the rows that have at least some value other than 0

res1 <- res[!!rowSums(res[,-(1:2)]),]

Update

For the sum, you can do rowSums

res$Sum <- rowSums(res[,-(1:2)])
head(res,3)
 #  author1 author2 2 3 4 5 6 8 11 12 Sum
 #1      FK      FN 0 0 0 0 0 0  0  0   0
 #2      FK      SM 0 0 0 0 0 0  0  0   0
 #3      FK      VA 0 0 0 0 0 0  1  1   2

Update2

Regarding the second question of dividing res1 by CL

 CL <- colSums(res1[,-(1:2)])
 CL <- CL-1
 CL[ CL<1 ] <- 0
 res1[-(1:2)]/CL[col(res1[-(1:2)])]

Update3

Regarding the new dataset,

   d <- read.csv('AuthorsRevised.csv', stringsAsFactors=FALSE)
   m <- xtabs(~Authors+ID,d)
   indx <- combn(dimnames(m)$Authors,2)
   dim(indx)
   #[1]      2 435711

   res <- cbind(t(indx), as.data.frame(do.call(rbind,
        lapply(split(indx, col(indx)), function(x) m[x[1],]*m[x[2],]))))

   colnames(res)[1:2] <- paste0('author', 1:2)

   dim(res)
   #[1] 435711    534
   res[1:3,1:3]
   #    author1          author2 1
   #1 Abe S.-i.    Achterberg W. 0
   #2 Abe S.-i. Adebowale B.O.A. 0
   #3 Abe S.-i.        Aghion P. 0

Another option if you just want the sum would be

   t1 <- crossprod(table(d))
   t1[upper.tri(t1, diag=TRUE)] <- NA
   library(reshape2)
   res1 <- melt(t1, na.rm=TRUE)[,c(2:1,3)]