1
votes

I have a matrix with a shape like below. I want to delete rows with duplicate values ​​in the first column and leaving row with the smallest number of duplicate values ​​in the second column. my matrix `d =

 1     1
 2     1
 4     1
 8     2
 2     2
 5     4
 2     4
 6     4
 7     3

` I want to remove duplicate number 2 in the first column and leaving the row with the smallest number of duplicate values ​​in the second row result required:

 1     1
 4     1
 8     2
 2     2
 5     4
 6     4
 7     3

Thanks for the helps. best regard.

2
The last lines in the input and output matrices are not the same.Alexander Korovin

2 Answers

1
votes
  1. Create a function that finds the minimal duplicate from the right column, given an index from the left column:

    function Out = getMinDuplicate (Index, Data)
      Candidates = Data(Data(:,1) == Index, :); Candidates = Candidates(:, 2);
      Hist = histc (Data(:,2), [1 : max(Data(:,2))]);
      [~,Out] = min (Hist(Candidates)); Out = Candidates(Out);
    end
    
  2. Call this function for all unique values in column 1:

    >> [unique(d(:,1)), arrayfun(@(x) getMinDuplicate(x, d), unique(d(:,1)))]
    ans =
         1     1
         2     2
         4     1
         5     4
         6     4
         7     3
         8     2
    

(where d is your data array).

2
votes

We can sort the array regards to first column and replace elements of second column by their descending count to obtain this array:

   1   3
   2   3
   2   3
   2   2
   4   3
   5   3
   6   3
   7   1
   8   2

Then if we apply unique to this array indices of desirable rows can be obtained and then then those rows can be extracted:

   1   1
   2   2
   4   1
   5   4
   6   4
   7   3
   8   2

If oreder of original data should be preserved more step required that commented in the code.

a=[...
 1     1
 2     1
 4     1
 8     2
 2     2
 5     4
 2     4
 6     4
 7     3];
 %steps to replace counts of each element of column2 with it
 [a2_sorted, i_a2_sorted] = sort(a(:,2));
 [a2_sorted_unique, i_a2_sorted_unique] = unique(a2_sorted);
 h = hist(a2_sorted, a2_sorted_unique);
 %count = repelems(h, [1:numel(h); h]);%octave
 count = repelem(h, h);
 [~,a2_back_idx] = sort(i_a2_sorted);
 count = count (a2_back_idx);
 b = [a(:,1) , count.'];
 %counts shoule be sorted in descending order 
 %because the unique function extracts last element from each category
 [b_sorted i_b_sorted] =sortrows(b,[1 -2]);
 [~, i_b1_sorted_unique] = unique(b_sorted(:,1));
 c = [b_sorted(:,1) , a(i_b_sorted,2)];
 out = c(i_b1_sorted_unique,:)
 %more steps to recover the original order
 [~,b_back_idx] = sort(i_b_sorted);
 idx_logic = false(1,size(a,1));
 idx_logic(i_b1_sorted_unique) = true;
 idx_logic = idx_logic(b_back_idx);
 out = c(b_back_idx(idx_logic),:)