Strange double values appearing from MPI communcation - memory issues?

Question

This is really a follow-up from this question, as I think I've solved the problem that the original question asked, but now have some other problems.

I have some MPI code which is doing a matrix transpose. It is doing this via point-to-point non-blocking communication using MPI_Isend and MPI_Irecv. I am working with doubles, and all of my MPI code uses MPI_DOUBLE as the type. However, I seem to be getting some strange memory issues - the key one of which is the inclusion of 'nonsense' numbers in my output. For example:

Test Process (2, 1): 68.000000 78.000000 
Test Process (2, 1): 387323398486945739062068424931898425134839058804189460794109462554519403357109477747039490936107027309191462010675537134594564349232145421118587860238537662203953149049188364045280831238661272720084252520359127715290869606638545797120.000000 881150864511763756676254370742733018389256944202962553716402946507192139671624750374865205489904045881646541419557063427368973644261533211221769931916194052019466643963904.000000 
Test Process (2, 1): 78.000000 88.000000

I can guess that somehow there is a memory problem - I'm reading some memory as a double when it isn't, or writing to memory as a double when it isn't. Any idea how I can go about debugging this?

The code is available here, but I'm not expecting detailed analysis of the code, more tips as to how this kind of error can occur using MPI communication, and what I might be able to do to track down the error.

Just to confirm a few things I've tried: it's not a problem with the initialisation of the array. I've tried initialising the array to a known value (999) and that doesn't appear in the array at the end, so obviously all of the new values (including the crazy ones) are coming from the MPI communications.

Any ideas?

Mark Wilkins Mark Wilkins · Accepted Answer · 2011-04-09T22:07:32

One potential issue is inconsistent indexing of array. At line 223, It appears that i and j might be backwards. (I'm not sure that in this iPod Touch viewing it the line numbers are matching up. It is the loop with the comment "Calculate the offsets") j and i are swapped rows for columns compared to the other loops. It appears that you have adjusted that a couple of different ways in the comments ... so maybe it is expected. I can't see the whole code very well since I am currently using an iPod touch with limited view. But that part does seem incorrect.

And the final loop also seems incorrect. In that one j and i are also reversed compared to the other loops.

Strange double values appearing from MPI communcation - memory issues?

1 Answers