To manipulate Scipy matrices, typically, the built-in methods are used. But sometimes you need to read the matrix data to assign it to non-sparse data types. For the sake of demonstration I created a random LIL sparse matrix and converted it to a Numpy array (pure python data types would have made a better sense!) using different methods.
from __future__ import print_function
from scipy.sparse import rand, csr_matrix, lil_matrix
import numpy as np
dim = 1000
lil = rand(dim, dim, density=0.01, format='lil', dtype=np.float32, random_state=0)
print('number of nonzero elements:', lil.nnz)
arr = np.zeros(shape=(dim,dim), dtype=float)
number of nonzero elements: 10000
Reading by indexing
%%timeit -n3
for i in xrange(dim):
for j in xrange(dim):
arr[i,j] = lil[i,j]
3 loops, best of 3: 6.42 s per loop
Using the nonzero()
method
%%timeit -n3
nnz = lil.nonzero() # indices of nonzero values
for i, j in zip(nnz[0], nnz[1]):
arr[i,j] = lil[i,j]
3 loops, best of 3: 75.8 ms per loop
Using the built-in method to convert directly to array
This one is not a general solution for reading the matrix data, so it does not count as a solution.
%timeit -n3 arr = lil.toarray()
3 loops, best of 3: 7.85 ms per loop
Reading Scipy sparse matrices with these methods is not efficient at all. Is there any faster way to read these matrices?
arr
? That's not fast. But in general indexing a sparse matrix is slower than indexing a dense array. If you really need speed, work directly with the data attributes of the matrix - at the expense of generality. – hpaulj