1
votes

The assignment problem asks to find a set of n elements with maximal possible sum in distinct rows and columns of a given n-by-n matrix. It can be solved optimally by the Hungarian algorithm in O(n^3). However, let us consider the following suboptimal greedy algorithm:

  1. Choose the maximal element in the remaining matrix
  2. Add this element to the resulting set, i.e. match the row of this element to its column, and remove these row and column from the matrix
  3. Repeat from the step 1.

What would be an efficient data structure to implement such an algorithm? Can its time complexity be O(n^2), if it requires O(n^2 log(n)) only to sort all the elements?

2
O(n^2) is fairly straightforward, but what's the point? - Amit
The point is to reduce the complexity by sacrificing optimality, but how do you do this in O(n^2)? - xivaxy
Sorry I misunderstood you the first time. You want your algorithm in O(n^2) including the loop (step 3), right? - Amit
Right, I want to find all these n elements with possibly maximal sum. - xivaxy

2 Answers

2
votes

A complexity of O(n2logn) may be achieved using something as follows.

You initially store a sorted version of each of n rows and columns.(the exact structure this stored rows and columns will use is clarified later in this answer) Note that, even though you will sort these according to the values of the cells, the values of the cells should be stored alongside their initial indices. The initial indices will help when deleting rows and columns. (step 2 of your algorithm)

After the preprocessing mentioned above, have a loop that iterates n times. Inside this loop will be the steps of your algorithm. At each iteration;

Step 1: Check the largest elements in each sorted row and column, updating the value and address for the globally maximal element.

Step 2.1: Add element to the resulting array/list/etc, and remove it from the sorted row and column structure of the corresponding row and column.

Step 2.2: Traverse each sorted row structure. For each row, we need to delete a column. Since we have the matrix and the address of the maximal element, we can also access the exact value and address of the column we are deleting for each row. Utilizing this, in each sorted row structure, given it is sorted, conduct a binary search to find that element, and remove it.

Here is the thing! If you store this sorted row structure as a straightforward array, removing this element will take O(n) time, as you have to carry the remaining part of the array to the left by 1. A solution I can think of is to use a structure akin to STL's set. It can, in fact, find in O(logn) just like binary search, and provides O(logn) time insertion and deletion.

Step 2.3: Identical to Step 2.2, this time each column structure is traversed. As we are deleting a row, we are removing an element from each column. However, thanks to having found the address(i.e. row&column) and the value of the globally maximal element, we know which row we are removing and the value and address of the element we remove from each column. So, for each column structure we find and remove that element from the STL's set-like data structure storing the sorted version of that column.

Performance

Preprocessing: Now that we know what kind of a data structure stores sorted rows and columns, we can say that it takes O(n2logn) as there are 2n of those structures, and we insert n elements to each one of them in O(logn) time.

Step 1: There are 2n sorted structures, which means accessing their largest element in O(logn) time will have a combined O(nlogn) time complexity.

Step 2.1: Though it depends on the data structure the resulting data is kept in, assuming it is an array, it takes O(1) time to add an element to it. However this step has an overall complexity of O(logn) as we need to remove the maximal element from the sorted row and column structures it belongs to.

Step 2.2: There are n sorted row structures for which we do a find and remove operation, causing an O(nlogn) burden.

Step 2.3: Has O(nlogn) cost, same as Step 2.2.

Considering steps 2.1, 2.2, and 2.3 are done n times, the overall complexity of the algorithm is O(n2logn).

Implementation

Below you may find a complete implementation of the algorithm I mentioned above, which seems to be working. The gist of the algorithm is the same, with minor changes due to implementation like defining a negative infinity or using an extra bool vector to store whether a row or column is completely deleted.(i.e. we won't bother to look at its sorted structure when looking for the maximal element in remaining matrix)

#include <set>
#include <vector>
#include <iostream>
#define NEGATIVE_INFINITY -1

using namespace std;

vector< set< pair<int, int> > > sortedRows;
vector< set< pair<int, int> > > sortedCols;
vector< int > result;
void solve(const vector< vector< int > > &matrix)
{
    int n = matrix.size();
    vector<bool> rowNotDeleted(n, true), colNotDeleted(n, true);

    // Preprocessing
    for(int i=0; i < n; ++i)
    {
        sortedRows.resize(n);
        sortedCols.resize(n);
        for(int j=0; j < n; ++j)
        {
            sortedRows[i].insert( make_pair(matrix[i][j], j) );
            sortedCols[j].insert( make_pair(matrix[i][j], i) );
        }
    }

    for(int k=0; k < n; ++k)
    {
        set< pair<int, int> >::reverse_iterator it;

        // STEP 1: Find max. element
        int maxVal = NEGATIVE_INFINITY, maxRow = -1, maxCol = -1;
        for(int i=0; i < n; ++i)
        {
            if(rowNotDeleted[i])
            {
                it = sortedRows[i].rbegin();
                if(it->first > maxVal)
                {
                    maxVal = it->first;
                    maxRow = i;
                    maxCol = it->second;
                }
            }
        }

        for(int i=0; i < n; ++i)
        {
            if(colNotDeleted[i])
            {
                it = sortedCols[i].rbegin();
                if(it->first > maxVal)
                {
                    maxVal = it->first;
                    maxRow = it->second;
                    maxCol = i;
                }
            }
        }

        // STEP 2.1: Add max. element to result.
        result.push_back(maxVal);
        /*
         * Due to my implementation, removing it from
         * relevant sorted data structures can be done
         * in steps 2.2 and 2.3.
         */
//        sortedRows[maxRow].erase( make_pair(maxVal, maxCol) );
//        sortedCols[maxCol].erase( make_pair(maxVal, maxRow) );
        rowNotDeleted[maxRow] = false;
        colNotDeleted[maxCol] = false;

        // STEP 2.2: Remove cells of deleted col from sorted row structures
        for(int i=0; i < n; ++i)
        {
            sortedRows[i].erase( make_pair(matrix[i][maxCol], maxCol) );
        }

        // STEP 2.3: Remove cells of deleted row from sorted col structures
        for(int i=0; i < n; ++i)
        {
            sortedCols[i].erase( make_pair(matrix[maxRow][i], maxRow) );
        }
    }
}

int main()
{
    int n;
    cin >> n;

    // Read matrix
    vector< vector< int > > matrix(n);
    for(int i=0; i < n; ++i)
    {
        matrix[i].resize(n);
        for(int j=0; j < n; ++j)
        {
            cin >> matrix[i][j];
        }
    }

    solve(matrix);

    // Output results
    cout << result[0];
    for(int i=1; i < n; ++i)
    {
        cout << " " << result[i];
    }
    cout << endl;

    return 0;
}
2
votes

Once the elements are sorted, you can just scan through, using two arrays to track the deleted rows and columns. In untested Python:

def greedy_max_match(weight):
    n = len(weight)
    row_deleted = [False] * n
    col_deleted = [False] * n
    sorted_weights = sorted(((weight[i][j], i, j)
                             for i in range(n) for j in range(n)),
                            reverse=True)
    for w, i, j in sorted_weights:
        if not row_deleted[i] and not col_deleted[j]:
            yield (i, j)
            row_deleted[i] = True
            col_deleted[j] = True