I'm trying to cluster similar documents using the R language. As a first step, I compute the term-document matrix for my set of documents. Then I create the latent semantic space for the term-document matrix previously created. I decided to use use LSA in my expriment because the results of clustering using just the term-document matrix were awful . Is possible to build a dissimilarity matrix (with cosine measure) using the the LSA space created? I need to do this because the clustering algorithm that I'm using requires a dissimilarity matrix as input.
Here is my code:
require(cluster);
require (lsa);
myMatrix = textmatrix("/home/user/DocmentsDirectory");
myLSAspace = lsa(myMatrix, dims=dimcalc_share());
I need to build a dissimilarity matrix (using cosine measure) from LSA space, so I can call the cluster algorithm as follows:
clusters = pam(dissimilartiyMatrix,10,diss=TRUE);
Any suggestions?
Thanks in advance!