I am trying to compute the similarity between 2 sparse vectors using cosine similarity. which is working fine. However, I would like to take the additional step of introducing a weighting to each index of the vector. e.g. where the vectors to compare are v1 = [1, 0, 0, 1, 1] and v2= [1, 0, 0, 0, 1], and the weighting vector is something like w = [.5, 1, 1, 2, 1.5]. This could be interpreted to mean the first element is half as important as elements 2 and 3, and the 4th element is twice as important, while the last element is 1.5 times as important to the similarity.
Is this even possible using cosine similarity? And if so, how would I modify the original formula to incorporate these weightings? Thanks! Original java code is below.
private double score(Vector<Double> v1, Vector<Double> v2) throws Exception{
int v1Size = v1.size();
if (v1Size != v2.size()){
throw new Exception("Vectors not same size");
}
double numerator = 0;
double v1squaresum = 0;
double v2squaresum = 0;
for (int i = 0; i < v1Size; i++){
double v1Val = v1.get(i);
double v2Val = v2.get(i);
numerator += (v1Val * v2Val);
v1squaresum += (v1Val * v1Val);
v2squaresum += (v2Val * v2Val);
}
if (numerator == 0 || v1squaresum == 0 || v2squaresum == 0){
return 0;
}
double denom = (Math.sqrt(v1squaresum) * Math.sqrt(v2squaresum));
return numerator / denom;
}