I have a document-term matrix dtm, for example:
dtm
<<DocumentTermMatrix (documents: 50, terms: 50)>>
Non-/sparse entries: 220/2497
Sparsity : 100%
Maximal term length: 7
Weighting : term frequency (tf)
Now I want transfer it into a list of matrices, each represents a document. This is to fulfill the formal requirement of the package STM:
[[1]]
[,1] [,2] [,3] [,4]
[1,] 23 33 42 117
[2,] 2 1 3 1
[[2]]
[,1] [,2] [,3] [,4]
[1,] 2 19 93 168
[2,] 2 2 1 1
I am thinking of finding all the non-zero entries from dtm and generate them into matrices, each row at a time, so:
mat = matrix()
dtm.to.mat = function(x){
mat[1,] = x[x != 0]
mat[2,] = colnames(x[x != 0])
return(mat)
}
matrix = list(apply(dtm, 1, dtm.to.mat))
However,
x[x != 0]
just won't work. The error says:
$ operator is invalid for atomic vectors
I was wondering why this is the case. If I change x to matrix beforehand, it won't give me this error. However, I actually have a dtm of approximately 2,500,000 lines. I fear this will be very inefficient.