Those are two different methods of creating a vector for a set-of-words.
The vectors will be in different positions, and of different quality.
Averaging is quite fast, especially if you've already got word-vectors. But it's a very simple approach that won't capture many shades of meaning – indeed it is completely oblivious to word ordering/relative proximities, and the act of averaging can tend to 'cancel out' contrasting meanings in the text.
Doc2Vec
instead trains vectors for full texts in a manner very similar to word-vectors (and often, alongside word-vectors). Essentially, a pretend-word that's assigned to the text 'floats' alonside the word-vector training, as if it were 'near' all the other word-training (for that one text). It's a slightly more sophisticated approach, but as it uses a very-similar algorithm (& model-complexity) on the same data, results on many downstream evaluations are often similar.
To obtain summary text-vectors capturing more subtle shades of meaning, as implied by grammatical rules and more advanced language usage, can require yet-more-sophisticated methods, such as those employing larger deep networks.
There's no single most efficient approach, as all real uses depend a lot on the type, quantity, and quality of your texts, and your intended uses of the vectors.