I have a question about the word representation algorithms: Which one of the algorithms word2Vec, doc2Vec and Tf-IDF is more suitable for handling text classification tasks ? The corpus used in my supervised learning classification is composed of a list of multiple sentences, with both short length sentences and long length ones. As discussed in this thread, doc2vec vs word2vec choice is a matter of document length. As for Tf-Idf vs. word embedding, it's more a matter of text representation.
My other question is, what if for the same corpus I had more than one label to link to the sentences in it ? If I create multiple entries/labels for the same sentence, it affects the decision of the final classification algorithm. How can I tell the model that every label counts equal for every sentence of the document ?
Thank you in advance,