3
votes

I have a matrix, whose first column contains user IDs, and the second column contains 1s and 0s. I need to find out the top 10 user IDs with the most number of 1s. That is, if the input matrix is the following,

27 0
36 0
36 1
36 0
36 0
27 0
27 0
36 1
27 0
27 0
27 0
27 0
27 1
36 0

I want the output to be the following:

36 2
27 1

That is, even though 27 occurs 8 times and 36 occurs only 6 times, 36 should come on top of 27 because it has more number of 1 values in the original matrix. How do I do this without using a for loop, because for loop takes a lot of time, and the matrix is actually a pretty big one, with lots of unique user IDs.

2
Which language are you using and what have you tried so far? - albert
@albert MATLAB. I used the matlab tag when posting the question. - Kristada673
What does the number 2 mean here? - choroba
@choroba The frequency of occurrence. 36 has the value 1 2 times, 27 has the value 1 1 time. - Kristada673

2 Answers

2
votes

You can find all user IDs with the unique function, which returns all unique values in an array:

ids = unique(inp(:,1));

The number of ones for an id can calculated by

sum( inp(inp(:,1)==ID,2) )

which uses some matrix indexing to find all rows where the first entry is the selected user id ID, and sums up the second entries of those rows.

To do these sums in a vectorized fashion, you can e.g. use the arrayfun function, which applies a function to each element of the supplied array. The function you'll want to call is the sum introduced above, and you want to apply it to all unique IDs. This is done by

arrayfun(@(x)sum(inp(inp(:,1)==x,2)),ids)

ans = 
    1
    2
2
votes

That problem setup seems perfect for solving with unique & accumarray -

%// Select rows with col-2 as 1s & find unique col-1 elements and IDs 
[unq_sA1,~,id] = unique(A(A(:,2)==1,1))

%// Get counts of such unique rows
counts = accumarray(id(:),1)

%// Get argsort for the counts to index into unique rows and the counts
[~,sort_idx] = sort(counts,'descend')
out = [unq_sA1(sort_idx) counts(sort_idx)]