I am a starter in text mining topic. When I run LDA() over a huge dataset with 996165 observations, it displays the following error:
Error in LDA(dtm, k, method = "Gibbs", control = list(nstart = nstart, : Each row of the input matrix needs to contain at least one non-zero entry.
I am pretty sure that there is no missing values in my corpus and also. The table of "DocumentTermMatrix" and "simple_triplet_matrix" is:
table(is.na(dtm[[1]]))
#FALSE
#57100956
table(is.na(dtm[[2]]))
#FALSE
#57100956
A little confused how "57100956" comes. But as my dataset is pretty large, I don't know how to check why does this error occurs. My LDA command is:
ldaOut<-LDA(dtm,k, method="Gibbs", control=list(nstart=nstart, seed = seed, best=best, burnin = burnin, iter = iter, thin=thin))
Can anyone provide some insights? Thanks.