Efficient Matrix decomposition into square submatrices in C++

Question

I have implemented a Matrix datatype in C++ by using 1D datatype and wrapping it into rows and columns. Now, I want to have this possibility to create square/blocked sub-matrices from this time and I want to do it in-memory.

The problem is that I want some of these sub-matrices to be transferable to GPU memory and can process them there in parallel. This is for example, useful for Matrix Multiplication. As these submatrices are not aligned in main-memory, copying them to device memory as a single unit looks impossible without creating separate copy? I want to have this direct GPU sub-matrix copy mapping to CPU-original matrix for updation and efficiency purpose. I don't know about exact partitioning in advance.

Do someone has some idea how can I achieve it possibly?

Just a reminder, matrix needs to be partitioned in blocks and not row-wise which will be relatively easy in C/C++.

xtofl xtofl · Accepted Answer · 2011-02-17T12:27:50

If the required sub-matrices are known at the time the 'master' matrix is created, and if they form a partition of the master, it's possible to create a composite matrix class somewhat like this:

// supposing an IMatrix<T> interface (pure virtual members only) class

template< typename T >
struct CompositeMatrix : public IMatrix<T> {
   typedef std::vector<PlainMatrix<T>*> tMatrices;

   tMatrices submatrices;
   T& element( size_t row, size_t column ) {
       return findsubmatrix( row, column )->element( row, column );
   }

   // find algorithm implementing 'chain of responsibility-like' pattern.
   PlainMatrix<T>* findsubmatrix( size_t row, size_t col ) {
     for( tMatrices::iterator it = submatrices.begin()
        ; it != submatrices.end()
        ; ++it)
     {
        if( it->contains( row,col ) ) return *it;            
     }
     return NULL;
   }
};

The 'PlainMatix' can be organized in a memory-efficient way.

Efficient Matrix decomposition into square submatrices in C++

2 Answers