I have an array of TF-IDF feature vectors. I'd like to find similar vectors in the array using two methods:
- Cosine similarity
- k-means clustering
Using Scikit Learn, this process is pretty simple.
Now I'd like to weight certain features so that they will influence the results more than the other features. For example, I might like to weight the first 100 elements of the TF-IDF vectors so that those features are more indicative of similarity than the rest of the features.
How can I meaningfully weight certain features in my feature vectors? Is the process for weighting certain features the same for each of the similarity algorithms I listed above?