Let's say you can find the vector R
corresponding to an entire document, using doc2vec. Let's also assume that using word2vec, you can find the vector v
corresponding to any word w
as well. And finally, let's assume that R
and v
are in the same N-dimensional space.
Assuming all this, you may then utilize plain old vector arithmetic to find out some correlations between R
and v
.
For starters, you can normalize v
. Normalizing, after all, is just dividing each dimension with the magnitude of v
. (i.e. |v|
) Let's call the normalized version of v
as v_normal
.
Then, you may project v_normal
onto the line represented by the vector R
. That projection operation is just finding the dot product of v_normal
and R
, right? Let's call that scalar result of the dot product as len_projection
. Well, you can consider len_projection / |v_normal|
as an indication of how parallel the context of the word is to that of the overall document. In fact, considering just len_projection
is enough, because in this case, as v_normal
is normalized, |v_normal| == 1
.
Now, you may apply this procedure to all words in the document, and consider the words that lead to greatest len_projection
values as the most significant words of that document.
Note that this method may end up finding frequently-used words like "I", or "and" as the most important words in a document, as such words appear in many different contexts. If that is an issue you would want to remedy, you may want do a post-processing step to filter such common words.
I sort of thought of this method on the spot here, and I am not sure whether this approach has any scientific backing. But, it may make sense, if you think about how most vector embeddings for words work. Word vectors are usually trained to represent the context in which a word is used. Thinking in terms of vector arithmetic, projecting the vector onto a line may reveal how parallel the context of that word w
is, to the overall context represented by that line.
Last but not least, as I have worked only with word2vec before, I am not sure if doc2vec and word2vec data can be used adjointly like I mentioned above. As I stated in the first paragraph of my answer, it is really critical that R
and v
has to be in the same N-dimensional space.