Fast nonzero indices per row/column for (sparse) 2D numpy array

Question

I am looking for the fastest way to obtain a list of the nonzero indices of a 2D array per row and per column. The following is a working piece of code:

preds = [matrix[:,v].nonzero()[0] for v in range(matrix.shape[1])]
descs = [matrix[v].nonzero()[0] for v in range(matrix.shape[0])]

Example input:

matrix = np.array([[0,0,0,0],[1,0,0,0],[1,1,0,0],[1,1,1,0]])

Example output

preds = [array([1, 2, 3]), array([2, 3]), array([3]), array([], dtype=int64)]
descs = [array([], dtype=int64), array([0]), array([0, 1]), array([0, 1, 2])]

(The lists are called preds and descs because they refer to the predecessors and descendants in a DAG when the matrix is interpreted as an adjacency matrix but this is not essential to the question.)

Timing example: For timing purposes, the following matrix is a good representative:

test_matrix = np.zeros(shape=(4096,4096),dtype=np.float32)
for k in range(16):
    test_matrix[256*(k+1):256*(k+2),256*k:256*(k+1)]=1

Background: In my code, these two lines take 75% of the time for a 4000x4000 matrix whereas the ensuing topological sort and DP algorithm take only the rest of the quarter. Roughly 5% of the values in the matrix are nonzero so a sparse-matrix solution may be applicable.

Thank you.

(On suggestion posted here as well: https://scicomp.stackexchange.com/questions/35242/fast-nonzero-indices-per-row-column-for-sparse-2d-numpy-array There are also answers there to which I will provide timings in the comments. This link contains an accepted answer that is twice as fast.)

I recommend you asking this on scicomp.stackexchange.com there is a better place — eusoubrasileiro
Could you please provide a minimal example including toy input/output? — norok2
Do you also have some indication of the input sizes, the typical sparsity coefficient and perhaps any assumption on the input (for example, your toy input is low triangular)? — norok2
@eusoubrasileiro, why did you recommend the scicomp board? This looks like Python/numpy programming question, not a theoretical one. Are you active on that board, and prepared to answer this question there? — hpaulj

norok2 norok2 · Accepted Answer · 2020-05-28T19:34:41

If you have enough motivation, Numba can do amazing things. Here is a quick implementation of the logic you need. Briefly, it computes the equivalent of np.nonzero() but it includes along the way the information to later dispatch the indices into the format you require. The information is inspired by sparse.csr.indptr and sparse.csc.indptr.

import numpy as np
import numba as nb


@nb.jit
def cumsum(arr):
    result = np.empty_like(arr)
    cumsum = result[0] = arr[0]
    for i in range(1, len(arr)):
        cumsum += arr[i]
        result[i] = cumsum
    return result


@nb.jit
def count_nonzero(arr):
    arr = arr.ravel()
    n = 0
    for x in arr:
        if x != 0:
            n += 1
    return n


@nb.jit
def row_col_nonzero_nb(arr):
    n, m = arr.shape
    max_k = count_nonzero(arr)
    indices = np.empty((2, max_k), dtype=np.uint32)
    i_offset = np.zeros(n + 1, dtype=np.uint32)
    j_offset = np.zeros(m + 1, dtype=np.uint32)
    n, m = arr.shape
    k = 0
    for i in range(n):
        for j in range(m):
            if arr[i, j] != 0:
                indices[:, k] = i, j
                i_offset[i + 1] += 1
                j_offset[j + 1] += 1
                k += 1
    return indices, cumsum(i_offset), cumsum(j_offset)


def row_col_idx_nonzero_nb(arr):
    (ii, jj), jj_split, ii_split = row_col_nonzero_nb(arr)
    ii_ = np.argsort(jj)
    ii = ii[ii_]
    return np.split(ii, ii_split[1:-1]), np.split(jj, jj_split[1:-1])

Compared to your approach (row_col_idx_sep() below), and a bunch of others, as per @hpaulj answer (row_col_idx_sparse_lil()) and @knl answer from scicomp.stackexchange.com (row_col_idx_sparse_coo()):

def row_col_idx_sep(arr):
    return (
        [arr[:, j].nonzero()[0] for j in range(arr.shape[1])],
        [arr[i, :].nonzero()[0] for i in range(arr.shape[0])],)

def row_col_idx_zip(arr):
    n, m = arr.shape
    ii = [[] for _ in range(n)]
    jj = [[] for _ in range(m)]
    x, y = np.nonzero(arr)
    for i, j in zip(x, y):
        ii[i].append(j)
        jj[j].append(i)
    return jj, ii

import scipy as sp
import scipy.sparse


def row_col_idx_sparse_coo(arr):
    coo_mat = sp.sparse.coo_matrix(arr)
    csr_mat = coo_mat.tocsr()
    csc_mat = coo_mat.tocsc()
    return (
        np.split(csc_mat.indices, csc_mat.indptr)[1:-1],
        np.split(csr_mat.indices, csr_mat.indptr)[1:-1],)

def row_col_idx_sparse_lil(arr):
    lil_mat = sp.sparse.lil_matrix(arr)
    return lil_mat.T.rows, lil_mat.rows

For inputs generated using:

def gen_input(n, density=0.1, dtype=np.float32):
    arr = np.zeros(shape=(n, n), dtype=dtype)
    indices = tuple(np.random.randint(0, n, (2, int(n * n * density))).tolist())
    arr[indices] = 1.0
    return arr

One would get (your test_matrix had approximately 0.06 non-zero density):

m = gen_input(4096, density=0.06)
%timeit row_col_idx_sep(m)
# 1 loop, best of 3: 767 ms per loop
%timeit row_col_idx_zip(m)
# 1 loop, best of 3: 660 ms per loop
%timeit row_col_idx_sparse_coo(m)
# 1 loop, best of 3: 205 ms per loop
%timeit row_col_idx_sparse_lil(m)
# 1 loop, best of 3: 498 ms per loop
%timeit row_col_idx_nonzero_nb(m)
# 10 loops, best of 3: 130 ms per loop

Indicating this to be close to twice as fast as the fastest scipy.sparse-based approach.

Fast nonzero indices per row/column for (sparse) 2D numpy array

2 Answers