0
votes

I want to put a column from one sparse columnar matrix into another (empty) sparse columnar matrix. Toy code:

import numpy as np
import scipy.sparse
row = np.array([0, 2, 0, 1, 2])
col = np.array([0, 0, 2, 2, 2])
data = np.array([1, 2, 4, 5, 6])
M=scipy.sparse.csc_matrix((data, (row, col)), shape=(3, 3))
E=scipy.sparse.csc_matrix((3, 3)) #empty 3x3 sparse matrix

E[:,1]=M[:,0]

However I get the warning:

SparseEfficiencyWarning: Changing the sparsity structure of a csc_matrix is >expensive. lil_matrix is more efficient.

This warning makes me fear that in the process the matrix is converted to another format and then back to csc, which is not efficient. Can anyone confirm this and have a solution?

1

1 Answers

0
votes

The warning is telling you that the process of setting new values in a csc (or csr) format matrix is complicated. Those formats aren't designed for easy changes like this. The lil format is designed to make that kind of change quick and easy, especially making changes in one row.

Note that the coo format doesn't even implement this kind of indexing.

It isn't converting to lil and back, but that might actually be a faster way. We'd have to do some time tests.

In [679]: %%timeit E=sparse.csr_matrix((3,3))
     ...: E[:,1] = M[:,0]
     ...: 
/usr/lib/python3/dist-packages/scipy/sparse/compressed.py:730: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  SparseEfficiencyWarning)
1000 loops, best of 3: 845 µs per loop
In [680]: %%timeit E=sparse.csr_matrix((3,3))
     ...: E1=E.tolil()
     ...: E1[:,1] = M[:,0]
     ...: E=E1.tocsc()
     ...: 
The slowest run took 4.22 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.42 ms per loop

In [682]: %%timeit E=sparse.lil_matrix((3,3))
     ...: E[:,1] = M[:,0]
     ...: 
1000 loops, best of 3: 804 µs per loop
In [683]: %%timeit E=sparse.lil_matrix((3,3));M1=M.tolil()
     ...: E[:,1] = M1[:,0]
     ...: 
     ...: 
1000 loops, best of 3: 470 µs per loop

In [688]: timeit M1=M.tolil()
The slowest run took 4.10 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 248 µs per loop

Notice that doing the assignment with lil (both sides) is 2x faster than doing it with csc. But conversion to/from lil takes up time.

Warning or not, what you are doing is fastest - for a onetime operation. But if you need to do this repeatedly, try to find a better way.

=================

Setting rows v columns doesn't make much difference.

In [835]: %%timeit E=sparse.csc_matrix((3,3))
     ...: E[:,1]=M[:,0]
  SparseEfficiencyWarning)
1000 loops, best of 3: 1.89 ms per loop

In [836]: %%timeit E=sparse.csc_matrix((3,3))
     ...: E[1,:]=M[0,:]    
  SparseEfficiencyWarning)
1000 loops, best of 3: 1.91 ms per loop