I am trying to understand relation between word2vec and doc2vec vectors in Gensim's implementation. In my application, I am tagging multiple documents with same label (topic), I am training a doc2vec model on my corpus using dbow_words=1 in order to train word vectors as well. I have been able to obtain similarities between word and document vectors in this fashion which does make a lot of sense For ex. getting documents labels similar to a word- doc2vec_model.docvecs.most_similar(positive = [doc2vec_model["management"]], topn = 50))
My question however is about theoretical interpretation of computing similarity between word2vec and doc2vec vectors. Would it be safe to assume that when trained on the same corpus with same dimensionality (d = 200), word vectors and document vectors can always be compared to find similar words for a document label or similar document labels for a word. Any suggestion/ideas are most welcome.
Question 2: My other questions is about impact of high/low frequency of a word in final word2vec model. If wordA and wordB have similar contexts in a particular doc label(set) of documents but wordA has much higher frequency than wordB, would wordB have higher similarity score with the corresponding doc label or not. I am trying to train multiple word2vec models by sampling corpus in a temporal fashion and want to know if the hypothesis that as words get more and more frequent, assuming context relatively stays similar, similarity score with a document label would also increase. Am I wrong to make this assumption? Any suggestions/ideas are very welcome.
Thanks, Manish