I am new to Parallel Programming. I am trying to multiply two matrices. I have partitioned the problem as follows:
Let the operation be mat3 = mat1 x mat2 I am broadcasting the mat2 to all the processes in the communicator, and cutting out strips of rows of the mat1 and scattering them to the processes in the commnicator group. After all processes has the entire mat2 and the corresponding strips of mat1, they multiply the strip with mat2 and then I am using the gather operation with the local results of the process, and accumulate the entire result in the root process.
I wanted to know if there is a better problem partitioning to multiply two matrix in a general purpose computer.
My implementation is in OpenMPI.