Weighted Cosine Similarity on Sparse Vectors

Question

I am trying to compute the similarity between 2 sparse vectors using cosine similarity. which is working fine. However, I would like to take the additional step of introducing a weighting to each index of the vector. e.g. where the vectors to compare are v1 = [1, 0, 0, 1, 1] and v2= [1, 0, 0, 0, 1], and the weighting vector is something like w = [.5, 1, 1, 2, 1.5]. This could be interpreted to mean the first element is half as important as elements 2 and 3, and the 4th element is twice as important, while the last element is 1.5 times as important to the similarity.

Is this even possible using cosine similarity? And if so, how would I modify the original formula to incorporate these weightings? Thanks! Original java code is below.

private double score(Vector<Double> v1, Vector<Double> v2) throws Exception{
    int v1Size = v1.size();
    if (v1Size != v2.size()){
        throw new Exception("Vectors not same size");
    }
    double numerator = 0;
    double v1squaresum = 0;
    double v2squaresum = 0;
    for (int i = 0; i < v1Size; i++){
        double v1Val = v1.get(i);
        double v2Val = v2.get(i);
        numerator += (v1Val * v2Val);
        v1squaresum += (v1Val * v1Val);
        v2squaresum += (v2Val * v2Val);
    }
    if (numerator == 0 || v1squaresum == 0 || v2squaresum == 0){
        return 0;
    }
    double denom = (Math.sqrt(v1squaresum) * Math.sqrt(v2squaresum));
    return numerator / denom;
}

This should provide some help: mathforum.org/kb/message.jspa?messageID=5658016&tstart=0 — Yuri
Have seen both of those, the first does not help and is not correct anyways. And I was hoping for a more concrete implementation than the second, but I may go that route if none surfaces — holtc

holtc holtc · Accepted Answer · 2016-09-02T14:26:50

0

votes

Solved by weighting the input vector and then normalizing, thanks for the comments.

Weighted Cosine Similarity on Sparse Vectors

1 Answers