2
votes

I am looking for the fastest way to obtain a list of the nonzero indices of a 2D array per row and per column. The following is a working piece of code:

preds = [matrix[:,v].nonzero()[0] for v in range(matrix.shape[1])]
descs = [matrix[v].nonzero()[0] for v in range(matrix.shape[0])]

Example input:

matrix = np.array([[0,0,0,0],[1,0,0,0],[1,1,0,0],[1,1,1,0]])

Example output

preds = [array([1, 2, 3]), array([2, 3]), array([3]), array([], dtype=int64)]
descs = [array([], dtype=int64), array([0]), array([0, 1]), array([0, 1, 2])]

(The lists are called preds and descs because they refer to the predecessors and descendants in a DAG when the matrix is interpreted as an adjacency matrix but this is not essential to the question.)

Timing example: For timing purposes, the following matrix is a good representative:

test_matrix = np.zeros(shape=(4096,4096),dtype=np.float32)
for k in range(16):
    test_matrix[256*(k+1):256*(k+2),256*k:256*(k+1)]=1

Background: In my code, these two lines take 75% of the time for a 4000x4000 matrix whereas the ensuing topological sort and DP algorithm take only the rest of the quarter. Roughly 5% of the values in the matrix are nonzero so a sparse-matrix solution may be applicable.

Thank you.

(On suggestion posted here as well: https://scicomp.stackexchange.com/questions/35242/fast-nonzero-indices-per-row-column-for-sparse-2d-numpy-array There are also answers there to which I will provide timings in the comments. This link contains an accepted answer that is twice as fast.)

2
I recommend you asking this on scicomp.stackexchange.com there is a better placeeusoubrasileiro
Could you please provide a minimal example including toy input/output?norok2
@norok2 Yes you are right. I added it.Richard Schoonhoven
Do you also have some indication of the input sizes, the typical sparsity coefficient and perhaps any assumption on the input (for example, your toy input is low triangular)?norok2
@eusoubrasileiro, why did you recommend the scicomp board? This looks like Python/numpy programming question, not a theoretical one. Are you active on that board, and prepared to answer this question there?hpaulj

2 Answers

5
votes

If you have enough motivation, Numba can do amazing things. Here is a quick implementation of the logic you need. Briefly, it computes the equivalent of np.nonzero() but it includes along the way the information to later dispatch the indices into the format you require. The information is inspired by sparse.csr.indptr and sparse.csc.indptr.

import numpy as np
import numba as nb


@nb.jit
def cumsum(arr):
    result = np.empty_like(arr)
    cumsum = result[0] = arr[0]
    for i in range(1, len(arr)):
        cumsum += arr[i]
        result[i] = cumsum
    return result


@nb.jit
def count_nonzero(arr):
    arr = arr.ravel()
    n = 0
    for x in arr:
        if x != 0:
            n += 1
    return n


@nb.jit
def row_col_nonzero_nb(arr):
    n, m = arr.shape
    max_k = count_nonzero(arr)
    indices = np.empty((2, max_k), dtype=np.uint32)
    i_offset = np.zeros(n + 1, dtype=np.uint32)
    j_offset = np.zeros(m + 1, dtype=np.uint32)
    n, m = arr.shape
    k = 0
    for i in range(n):
        for j in range(m):
            if arr[i, j] != 0:
                indices[:, k] = i, j
                i_offset[i + 1] += 1
                j_offset[j + 1] += 1
                k += 1
    return indices, cumsum(i_offset), cumsum(j_offset)


def row_col_idx_nonzero_nb(arr):
    (ii, jj), jj_split, ii_split = row_col_nonzero_nb(arr)
    ii_ = np.argsort(jj)
    ii = ii[ii_]
    return np.split(ii, ii_split[1:-1]), np.split(jj, jj_split[1:-1])

Compared to your approach (row_col_idx_sep() below), and a bunch of others, as per @hpaulj answer (row_col_idx_sparse_lil()) and @knl answer from scicomp.stackexchange.com (row_col_idx_sparse_coo()):

def row_col_idx_sep(arr):
    return (
        [arr[:, j].nonzero()[0] for j in range(arr.shape[1])],
        [arr[i, :].nonzero()[0] for i in range(arr.shape[0])],)
def row_col_idx_zip(arr):
    n, m = arr.shape
    ii = [[] for _ in range(n)]
    jj = [[] for _ in range(m)]
    x, y = np.nonzero(arr)
    for i, j in zip(x, y):
        ii[i].append(j)
        jj[j].append(i)
    return jj, ii
import scipy as sp
import scipy.sparse


def row_col_idx_sparse_coo(arr):
    coo_mat = sp.sparse.coo_matrix(arr)
    csr_mat = coo_mat.tocsr()
    csc_mat = coo_mat.tocsc()
    return (
        np.split(csc_mat.indices, csc_mat.indptr)[1:-1],
        np.split(csr_mat.indices, csr_mat.indptr)[1:-1],)
def row_col_idx_sparse_lil(arr):
    lil_mat = sp.sparse.lil_matrix(arr)
    return lil_mat.T.rows, lil_mat.rows

For inputs generated using:

def gen_input(n, density=0.1, dtype=np.float32):
    arr = np.zeros(shape=(n, n), dtype=dtype)
    indices = tuple(np.random.randint(0, n, (2, int(n * n * density))).tolist())
    arr[indices] = 1.0
    return arr

One would get (your test_matrix had approximately 0.06 non-zero density):

m = gen_input(4096, density=0.06)
%timeit row_col_idx_sep(m)
# 1 loop, best of 3: 767 ms per loop
%timeit row_col_idx_zip(m)
# 1 loop, best of 3: 660 ms per loop
%timeit row_col_idx_sparse_coo(m)
# 1 loop, best of 3: 205 ms per loop
%timeit row_col_idx_sparse_lil(m)
# 1 loop, best of 3: 498 ms per loop
%timeit row_col_idx_nonzero_nb(m)
# 10 loops, best of 3: 130 ms per loop

Indicating this to be close to twice as fast as the fastest scipy.sparse-based approach.

1
votes
In [182]: arr = np.array([[0,0,0,0],[1,0,0,0],[1,1,0,0],[1,1,1,0]])                      

The data is present in the whole-array nonzero, just not broken up into per row/column arrays:

In [183]: np.nonzero(arr)                                                                
Out[183]: (array([1, 2, 2, 3, 3, 3]), array([0, 0, 1, 0, 1, 2]))
In [184]: np.argwhere(arr)                                                               
Out[184]: 
array([[1, 0],
       [2, 0],
       [2, 1],
       [3, 0],
       [3, 1],
       [3, 2]])

It might be possible to break the array([1, 2, 2, 3, 3, 3]) into sublists, [1,2,3],[2,3],[3],[] based on the other array. But it may take some time to work out the logic for that, and there's no guarantee that it will be faster than your row/column iterations.

Logical operations can reduce the boolean array to column or row, giving the rows or columns where nonzero occurs, but again not ragged:

In [185]: arr!=0                                                                         
Out[185]: 
array([[False, False, False, False],
       [ True, False, False, False],
       [ True,  True, False, False],
       [ True,  True,  True, False]])
In [186]: (arr!=0).any(axis=0)                                                           
Out[186]: array([ True,  True,  True, False])
In [187]: np.nonzero((arr!=0).any(axis=0))                                               
Out[187]: (array([0, 1, 2]),)
In [188]: np.nonzero((arr!=0).any(axis=1))                                               
Out[188]: (array([1, 2, 3]),)
In [189]: arr                                                                            
Out[189]: 
array([[0, 0, 0, 0],
       [1, 0, 0, 0],
       [1, 1, 0, 0],
       [1, 1, 1, 0]])

The scipy.sparse lil format does generate the data you want:

In [190]: sparse                                                                         
Out[190]: <module 'scipy.sparse' from '/usr/local/lib/python3.6/dist-packages/scipy/sparse/__init__.py'>
In [191]: M = sparse.lil_matrix(arr)                                                     
In [192]: M                                                                              
Out[192]: 
<4x4 sparse matrix of type '<class 'numpy.longlong'>'
    with 6 stored elements in List of Lists format>
In [193]: M.rows                                                                         
Out[193]: array([list([]), list([0]), list([0, 1]), list([0, 1, 2])], dtype=object)
In [194]: M.T                                                                            
Out[194]: 
<4x4 sparse matrix of type '<class 'numpy.longlong'>'
    with 6 stored elements in List of Lists format>
In [195]: M.T.rows                                                                       
Out[195]: array([list([1, 2, 3]), list([2, 3]), list([3]), list([])], dtype=object)

But timing probably isn't any better than your row or column iteration.