0
votes

I've been recently dealing with sparse matrices. My aim is to somehow convert an adjacency list for a graph into the CSR format, defined here: http://devblogs.nvidia.com/parallelforall/wp-content/uploads/2014/07/CSR.png.

One possible option I see, is that I simply first construct a NumPy matrix and convert it using scipy.sparse.csr_matrix. The problem is, that the CSR in SciPy is somewhat different to the one discussed in the link. My question is, is this just a discrepancy, and I need to write my own parser, or can SciPy in fact convert into CSR defined in the link.

A bit more about the problem, let's say I have a matrix:

matrix([[1, 1, 0],
        [0, 0, 1],
        [1, 0, 1]])

CSR format for this consists of two arrays, Column(C) and row(R). And i strive for looks like:

C: [0,1,2,0,2]

R: [0,2,3,5]

SciPy returns the:

  (0, 0)    1
  (0, 1)    1
  (1, 2)    1
  (2, 0)    1
  (2, 2)    1

where second column is the same as my C, yet this is to my understanding the COO format, not the CSR. (this was done using csr_matrix(adjacency_matrix) function).

1

1 Answers

2
votes

There is a difference in what is stored internally and what you see when you simply print the matrix via print(A) (where A is a csr_matrix).

In the documentation the attributes are listed. Among others there are the following three attributes:

data CSR format data array of the matrix
indices CSR format index array of the matrix
indptr CSR format index pointer array of the matrix

You can access (and manipulate) them through A.data, A.indices and A.indptr.

Bottom line: The CSR format in scipy is a "real" CSR format and you do not need to write your own parser (as long as you don't care about the in your case unnecessary data array).
Also note: A matrix in CSR format is always represented by three arrays, not two.