I have a set of document vectors generated using gensim doc2vec (~500K vectors of 150 dimensions). I wish to cluster similar documents for which i want to generate a n*n similarity matrix over which i can run my clustering algorithm.
I tried instructions of this link https://github.com/RaRe-Technologies/gensim/issues/140 using the gensim.similarities but the output for 500k records was 500k*150 matrix. I dont understand the output. Shouldn't it be 500k * 500k ? am i missing something?