I am having trouble formatting a large 2D matrix into a "sparse matrix" format. I hope someone here can help.
Due to the large file size, an awk script should be prefered.
Here is a hypothetical set from the data (tab delimited):
Regions string8-0 string8-5000 string8-10000
string8-0 0 2.1 1.15
string8-5000 0 0 2.4
string8-10000 0 4 3.1
So, the resulted output should be a tab separeted file with four columns. Column1 = the numeric part of rownames (colomn1 from data; example: 0, 5000, 1000); Column2 = is the corresponding numeric part of the colnames (example: 0, 5000, 10000), Column3 = the value between the two bin. Column4 = string part of column names (string8, string8, string8).
And only print this if the corresponding value is not zero.
like this:
0 5000 2.1 string8
0 10000 1.15 string8
5000 10000 2.4 string8
10000 5000 4 string8
10000 10000 3.1 string8
Sorry, if it is a redundant question, but I could not find any solutions.
Thanks, Laszlo