The infer_vector()
method will train-up a doc-vector for a new text, which should be a list-of-tokens that were preprocessed just like the training texts).
And, as you've noted, model.docvecs['my_tag']
will get the pre-trained doc-vector for one of the tags that was known during training.
Checking the similarity of a new vector, against the vectors for all known-tags, is a reasonable baseline way to see what existing tags a new document is similar-to. The closest tag, or closest few tags, might be reasonable labels for an unknown document, as a sort of 'nearest-neighbor' approach.
But, note that the original/usual Doc2Vec
approach is to give each document a unique ID, and let each ID-tag get its own vector. And then, perhaps, use those vectors with known-labels to train some other classifier that maps vectors to labels. (This might work better in some cases, if the "areas of the doc-vector space" that humans associate with a particular label aren't neat radiuses around a single centroid point for each label.)
Your approach of using, or adding, known-labels as doc-tags can often help. But also note that if you're only using 4 unique tags across thousands of documents, that's functionally very similar to just training the model with 4 giant documents – which may not be good at positioning those 4 vectors in a large-dimensional space (>4 dimensions), because there's not so much of the variety/subtle-contrasts that are needed to nudge the trained vectors into useful arrangements. (Typical published Doc2Vec
work uses tens-of-thousands to millions of unique docs and doc-tags.)