R Creating co-occurrence matrix

Question

My question is about text mining, and text processing. I would like to build a co-occurrence matrix from my data. My data is:

dat <- read.table(text="id_reférence id_paper
        621107   621100
        621100   621101
        621107   621102
        621109   621103
        621105   621104
        621103   621105
        621109   621106
        621106   621107
        621107   621108
        621106   621109", header=T)

expected <- matrix(0,10,10)
### Article 1 has been cited by article 2
expected[2, 1] <- 1

Thanks in advance :)

Ozan147 Ozan147 · Accepted Answer · 2018-11-24T20:22:01

# loop through the observations of dat
for(i in seq_len(nrow(dat))) {
  # convert reference ids to integer and store in a vector
  # example data requires this step, you may already have integers in your actual data
  ref <- as.integer(strsplit(as.character(dat$id_reférence[i]), ",")[[1]])
  # loop through the list of references
  for(j in ref) {
    # mark the citations using (row, column) ~ (i, j) pairs
    expected[dat$id_paper[i], j] <- 1
  }
}

expected
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    0    1    0    0    0    0    0    0    0     0
# [2,]    0    0    0    1    0    0    0    1    0     0
# [3,]    1    0    0    0    1    0    0    0    0     0
# [4,]    0    0    0    0    0    0    0    1    0     0
# [5,]    0    0    0    1    1    0    0    0    1     0
# [6,]    0    0    1    0    0    0    0    1    0     0
# [7,]    0    1    0    1    0    0    0    0    0     0
# [8,]    0    0    0    0    0    1    0    0    1     0
# [9,]    0    0    0    0    0    0    0    0    0     1
# [10,]   1    0    0    1    0    0    0    0    1     0

R Creating co-occurrence matrix

2 Answers