error in LDA in r: each row of the input matrix needs to contain at least one non-zero entry

Question

I am a starter in text mining topic. When I run LDA() over a huge dataset with 996165 observations, it displays the following error:

Error in LDA(dtm, k, method = "Gibbs", control = list(nstart = nstart, : Each row of the input matrix needs to contain at least one non-zero entry.

I am pretty sure that there is no missing values in my corpus and also. The table of "DocumentTermMatrix" and "simple_triplet_matrix" is:

table(is.na(dtm[[1]]))
#FALSE 
#57100956 

table(is.na(dtm[[2]]))
#FALSE 
#57100956

A little confused how "57100956" comes. But as my dataset is pretty large, I don't know how to check why does this error occurs. My LDA command is:

ldaOut<-LDA(dtm,k, method="Gibbs", control=list(nstart=nstart, seed = seed, best=best, burnin = burnin, iter = iter, thin=thin))

Can anyone provide some insights? Thanks.

Francesco Dal Pont Francesco Dal Pont · Accepted Answer · 2016-07-09T12:45:38

In my opinion the problem is not the presence of missing values, but the presence of all 0 rows. To check it:

raw.sum=apply(table,1,FUN=sum) #sum by raw each raw of the table

Then you can delete all raws which are all 0 doing:

table=table[raw.sum!=0,]

Now table should has all "non 0" raws.