0
votes

I've got BOW vectors and I'm wondering if there's a supervised dimensionality reduction algorithm in sklearn or gensim capable of taking high-dimensional, supervised data and projecting it into a lower dimensional space which preserves the variance between these classes.

Actually I'm trying to find a proper metric for the classification/regression, and I believe using dimensionality can help me. I know there's unsupervised methods, but I want to keep the label information along the way.

2

2 Answers

0
votes

FastText - implementation from Facebook research, essentially help you achieve what you have been asking for. Since you were asking about gensim, I assume you might be aware of word2vec in gensim.

Now word2vec was proposed Mikolov while at google. Mikolov and his team at Facebook ahs come up with fastText, which takes into consideration the word and sub-word information. It also allows for classification of text.

-1
votes

you can only perform dimensionality reduction in an unsupervised manner OR supervised but with different labels than your target labels.

For example you could train a logistic regression classifier with a dataset containing 100 topics. the output of this classifier (100 values) using your training data could be your dimensionality reduced feature set.