0
votes

I have a matrix with a large number of rows. I have another matrix that I will loop through one row at a time. For each row in the second matrix, I need to look for similar rows in the first matrix. Once all the similar rows are found, I need to know the row numbers of the similar rows. These rows will almost never be exact, so ismember does not work.

Also, the solution would preferably (not necessarily, however) give some way to set a level of similarity that would trigger the code to say it is similar and give me the row number.

Is there any way to do this? I've looked around, and I can't find anything.

1
What is your measure of similarity, eucliadean distance? pdist2 should help youDan
@ServerS Note also that pdist2 can be used with other distances, not just EuclideanLuis Mendo

1 Answers

3
votes

You could use cosine distance, which finds the angle between two vectors. Similar vectors (in your case, a row and your comparison vector) have a value close to 1 and dissimilar vectors have a value close to 0.

function d = cosSimilarity(u, v)
  d = dot(u,v)/(norm(u)*norm(v));
end

To apply this function to each to all pairs of rows in the matrices M and V you could use nested for loops. Hardly the most elegant, but it will work:

numRowsM = size(M, 1)
numRowsV = size(V, 1)
similarThresh = .9

for m = 1:numRowsM
    for v = 1:numRowsV 
        similarity = cosSimilarity(V(v,:), M(m, :))

        % Notify about similar rows
        if similarity > similarThresh
            disp([num2str(m) ' is similar to a row in V'])
        end
    end
end

Instead of nested for loops, there are definitely other ways. You could start by looking at the solution from this question, which will help you avoid the loop by converting the rows of the matrix into cells of a cell array and then applying the function with cellfun.