Updating sparse connectivity matrix in scipy

Question

I'm working with a connectivity matrix that is a representation of a graph datastructure. The NxM matrix corresponds to N edges with M vertices (it's likely to have more edges than vertices, which is why I am working with scipy's csr_matrix). The "start" point of the edge is represented by "-1" and the end point is represent by "1" in the connectivity matrix. All other values are 0, so each row only has 2 nonzero values.

I need to integrate a "subdivide" method, which will efficiently update the connectivity matrix. Currently I am transforming the connectivity matrix to a dense matrix so I can add the new rows/columns and update the old ones. I am converting to a dense matrix as I haven't found a solution to finding the column index for updating the old edge connectivity (no equivalent scipy.where) and the csr representation does not allow me to update values via indexing.

from numpy import where, array, zeros, hstack, vstack
from scipy.sparse import coo_matrix, csr_matrix

def connectivity_matrix(edges):
    m    = len(edges)
    data = array([-1] * m + [1] * m)
    rows = array(list(range(m)) + list(range(m)))
    cols = array([edge[0] for edge in edges] + [edge[1] for edge in edges])
    C    = coo_matrix((data, (rows, cols))).asfptype()
    return C.tocsr()

def subdivide_edges(C, edge_indices):
    C = C.todense()
    num_e = C.shape[0] # number of edges
    num_v = C.shape[1] # number of vertices

    for edge in edge_indices:
        num_e += 1 # increment row (edge count)
        num_v += 1 # increment column (vertex count)
        _, start = where(C[edge] == -1.0)
        _, end   = where(C[edge] == 1.0)
        si    = start[0]
        ei    = end[0]
        # add row
        r, c = C.shape
        new_r = zeros((1, c))
        C = vstack([C, new_r])
        # add column
        r, c = C.shape
        new_c = zeros((r, 1))
        C = hstack([C, new_c])
        # edit edge start/end points
        C[edge, ei] = 0.0
        C[edge, num_v - 1] = 1.0
        # add new edge start/end points
        C[num_e - 1, ei] = 1.0
        C[num_e - 1, num_v - 1] = -1.0

    return csr_matrix(C)

edges = [(0, 1), (1, 2)] # edge connectivity
C = connectivity_matrix(edges)
C = subdivide_edges(C, [0, 1])
# new edge connectivity: [(0, 3), (1, 4), (3, 1), (4, 2)]

hpaulj hpaulj · Accepted Answer · 2017-10-06T16:25:56

A sparse matrix does have a nonzero method (np.where uses np.nonzero). But look at its code - it returns coo row/cols data.

Using a sparse matrix left over from another question:

In [468]: M
Out[468]: 
<5x5 sparse matrix of type '<class 'numpy.float64'>'
    with 5 stored elements in COOrdinate format>
In [469]: Mc = M.tocsr()
In [470]: Mc.nonzero()
Out[470]: (array([0, 1, 2, 3, 4], dtype=int32), array([2, 0, 4, 3, 1], dtype=int32))
In [471]: Mc[1,:].nonzero()
Out[471]: (array([0]), array([0]))
In [472]: Mc[3,:].nonzero()
Out[472]: (array([0]), array([3]))

I converted to csr to do the row index.

There is also a sparse vstack.

But iterative work on sparse matrix is slow compared to dense arrays.

Be wary of float comparisons like C[edge] == -1.0. == tests work much better with integers.

Changing values from zero to nonzero does raise a warning, but does work:

In [473]: Mc[1,1] = 23
/usr/local/lib/python3.5/dist-packages/scipy/sparse/compressed.py:774: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  SparseEfficiencyWarning)
In [474]: (Mc[1,:]==23).nonzero()
Out[474]: (array([0]), array([1]))

Changing nonzeros to zero doesn't produce the warning, but it also doesn't change the underlying sparsity (until the matrix is cleaned up). lil format is better for element by element changes.

In [478]: Ml = M.tolil()
In [479]: Ml.nonzero()
Out[479]: (array([0, 1, 2, 3, 4], dtype=int32), array([2, 0, 4, 3, 1], dtype=int32))
In [480]: Ml[1,:].nonzero()
Out[480]: (array([0], dtype=int32), array([0], dtype=int32))
In [481]: Ml[1,2]=.5
In [482]: Ml[1,:].nonzero()
Out[482]: (array([0, 0], dtype=int32), array([0, 2], dtype=int32))
In [483]: (Ml[1,:]==.5).nonzero()
Out[483]: (array([0], dtype=int32), array([2], dtype=int32))

In [486]: sparse.vstack((Ml,Ml),format='lil')
Out[486]: 
<10x5 sparse matrix of type '<class 'numpy.float64'>'
    with 12 stored elements in LInked List format>

sparse.vstack works by converting the inputs to coo, and joining their attributes (rows, cols, data), and making a new matrix.

I suspect that your code will work with a lil matrix without too many changes. But it probably will be slower. Sparse gets its best speed when doing things like matrix multiplication on low density matrices. It also helps when the dense equivalents are too large to fit in memory. But for iterative work and growing matrices it is slow.

Updating sparse connectivity matrix in scipy

1 Answers