1
votes

Say we have a matrix of zeros and ones

 0     1     1     1     0     0     0
 1     1     1     1     0     1     1
 0     0     1     0     0     1     0
 0     1     1     0     1     1     1
 0     0     0     0     0     0     1
 0     0     0     0     0     0     1

and we want to find all the submatrices (we just need the row indices and column indices of the corners) with these properties:

  1. contain at least L ones and L zeros
  2. contain max H elements

i.e. take the previous matrix with L=1 and H=5, the submatrix 1 2 1 4 (row indices 1 2 and column indices 1 4)

 0     1     1     1
 1     1     1     1

satisfies the property 1 but has 8 elements (bigger than 5) so it is not good;

the matrix 4 5 1 2

 0     1
 0     0

is good because satisfies both the properties.

The objective is then to find all the submatrices with min area 2*L, max area H and containg at least L ones and L zeros.

If we consider a matrix as a rectangle it is easy to find all the possibile subrectangles with max area H and min area 2*L by looking at the divisors of all the numbers from H to 2*L.

For example, with H=5 and L=1 all the possibile subrectangles/submatrices are given by the divisors of

  • H=5 -> divisors [1 5] -> possibile rectangles of area 5 are 1x5 and 5x1
  • 4 -> divisors [1 2 4] -> possibile rectangles of area 4 are 1x4 4x1 and 2x2
  • 3 -> divisors [1 3] -> possibile rectangles of area 3 are 3x1 and 1x3
  • 2*L=2 -> divisors [1 2] -> possibile rectangles of area 2 are 2x1 and 1x2

I wrote this code, which, for each number finds its divisors and cycles over them to find the submatrices. To find the submatrices it does this: take for example a 1x5 submatrix, what the code does is to fix the first line of the matrix and move step by step (along all the columns of the matrix) the submatrix from the left edge of the matrix to the right edge of the matrix, then the code fixes the second row of the matrix and moves the submatrix along all the columns from left to right, and so on until it arrives at the last row.

It does this for all the 1x5 submatrices, then it considers the 5x1 submatrices, then the 1x4, then the 4x1, then the 2x2, etc.

The code do the job in 2 seconds (it finds all the submatrices) but for big matrices, i.e. 200x200, a lot of minutes are needed to find all the submatrices. So I wonder if there are more efficient ways to do the job, and eventually which is the most efficient.

This is my code:

clc;clear all;close all

%% INPUT
P=  [0     1     1     1     0     0     0 ;
     1     1     1     1     0     1     1 ;
     0     0     1     0     0     1     0 ;
     0     1     1     0     1     1     1 ;
     0     0     0     0     0     0     1 ;
     0     0     0     0     0     0     1];
L=1; % a submatrix has to containg at least L ones and L zeros
H=5; % max area of a submatrix

[R,C]=size(P); % rows and columns of P
sub=zeros(1,6); % initializing the matrix containing the indexes of each submatrix (columns 1-4), their area (5) and the counter (6)
counter=1; % no. of submatrices found

%% FIND ALL RECTANGLES OF AREA >= 2*L & <= H
%
% idea: all rectangles of a certain area can be found using the area's divisors
%       e.g. divisors(6)=[1 2 3 6] -> rectangles: 1x6 6x1 2x3 and 3x2
tic
for sH = H:-1:2*L % find rectangles of area H, H-1, ..., 2*L
    div_sH=divisors(sH); % find all divisors of sH
    disp(['_______AREA ', num2str(sH), '_______'])

    for i = 1:round(length(div_sH)/2) % cycle over all couples of divisors        
        div_small=div_sH(i);
        div_big=div_sH(end-i+1);        

        if div_small <= R && div_big <= C % rectangle with long side <= C and short side <= R

            for j = 1:R-div_small+1 % cycle over all possible rows

                for k = 1:C-div_big+1 % cycle over all possible columns                    
                    no_of_ones=length(find(P(j:j-1+div_small,k:k-1+div_big))); % no. of ones in the current submatrix

                    if  no_of_ones >= L  &&  no_of_ones <= sH-L % if the submatrix contains at least L ones AND L zeros                        
                        %               row indexes    columns indexes         area         position
                        sub(counter,:)=[j,j-1+div_small , k,k-1+div_big , div_small*div_big , counter]; % save the submatrix
                        counter=counter+1;
                    end
                end
            end
            disp([' [', num2str(div_small), 'x', num2str(div_big), '] submatrices: ', num2str(size(sub,1))])
        end
        if div_small~=div_big % if the submatrix is a square, skip this part (otherwise there will be duplicates in sub)

            if div_small <= C && div_big <= R % rectangle with long side <= R and short side <= C

                for j = 1:C-div_small+1 % cycle over all possible columns

                    for k = 1:R-div_big+1 % cycle over all possible rows                        
                        no_of_ones=length(find(P(k:k-1+div_big,j:j-1+div_small)));

                        if no_of_ones >= L  &&  no_of_ones <= sH-L
                            sub(counter,:)=[k,k-1+div_big,j,j-1+div_small , div_big*div_small, counter];
                            counter=counter+1;
                        end
                    end
                end
                disp([' [', num2str(div_big), 'x', num2str(div_small), '] submatrices: ', num2str(size(sub,1))])
            end
        end
    end
end
fprintf('\ntime: %2.2fs\n\n',toc)
2

2 Answers

3
votes

Here is a solution centered around 2D matrix convolution. The rough idea is to convolve P for each submatrix shape with a second matrix such that each element of the resulting matrix indicates how many ones are in the submatrix having its top left corner at said element. Like this you get all solutions for a single shape in one go, without having to loop over rows/columns, greatly speeding things up (it takes less than a second for a 200x200 matrix on my 8 years old laptop)

P=  [0     1     1     1     0     0     0
    1     1     1     1     0     1     1
    0     0     1     0     0     1     0
    0     1     1     0     1     1     1
    0     0     0     0     0     0     1
    0     0     0     0     0     0     1];

L=1; % a submatrix has to containg at least L ones and L zeros
H=5; % max area of a submatrix
submats = [];
for sH = H:-1:2*L
    div_sH=divisors(sH); % find all divisors of sH
    for i = 1:length(div_sH) % cycle over all couples of divisors
        %number of rows of the current submatrix
        nrows=div_sH(i); 
        % number of columns of the current submatrix
        ncols=div_sH(end-i+1); 
        % perpare matrix to convolve P with
        m = zeros(nrows*2-1,ncols*2-1);
        m(1:nrows,1:ncols) = 1;
        % get the number of ones in the top left corner each submatrix
        submatsums = conv2(P,m,'same');
        % set values where the submatrices go outside P invalid
        validsums = zeros(size(P))-1;
        validsums(1:(end-nrows+1),1:(end-ncols+1)) = submatsums(1:(end-nrows+1),1:(end-ncols+1));
        % get the indexes where the number of ones and zeros is >= L
        topLeftIdx = find(validsums >= L & validsums<=sH-L);
        % save submatrixes in following format: [index, nrows, ncols]
        % You can ofc use something different, but it seemed the simplest way to me
        submats = [submats ; [topLeftIdx bsxfun(@times,[nrows ncols],ones(length(topLeftIdx),1))]];
    end
end
1
votes

First, I suggest that you combine finding the allowable sub-matrix sizes.

for smaller = 1:sqrt(H)
    for larger = 2*L:H/smaller
        # add smaller X larger and larger x smaller to your shapes list

Next, start with the smallest rectangles in the shapes. Note that any solution to a small rectangle can be extended in any direction, to the area limit of H, and the added elements will not invalidate the solution you found. This will identify many solutions without bothering to check the populations within.

Keep track of the solutions you've found. As you work your way toward larger rectangles, you can avoid checking anything already in your solutions set. If you keep that in a hash table, checking membership is O(1). All you'll need to check thereafter will be larger blocks of mostly-1 adjacent to mostly-0. This should speed up the processing somewhat.

Is that enough of a nudge to help?