I am a newby in MPI and Parallel Computing Environments. I have different sparse matrices from https://sparse.tamu.edu/ and I am trying to multiply a sparse matrix with a dense vector by row partitioning with the size of N x N and N x 1 respectively. I will test my parallel MPI Program with the number of processes 1,2,4,8,16.
I have done some research about this algorithm and found a better solution and roadmap from this presentation. https://www.sandia.gov/~mmwolf/presentations/CS591MH/CS591MH_20070913.pdf
Algorithm is like this;
- First,partition the whole sparse matrix row-wise for each process and also partition the dense vector. For memory efficiency, also store nonzero elements of sparse matrix.
- Before doing any computation, send a vector element needed by sent x[j] to remote processes that have nonzeros in column j.
- Do computation and save each row results in output vector.
I did not understand how to specify remote processes for sending x[j]. If I specify, How can I communicate between these processes with non-blocking send and receive operations ? Should I use for loop for each send operation ?
Thanks in advance.