1
votes

I want to concatenate two csr_matrix, each with shape=(1,N).

I know I should use scipy.sparse.vstack:

from scipy.sparse import csr_matrix,vstack
c1 = csr_matrix([[1, 2]])

c2 = csr_matrix([[3, 4]])

print c1.shape,c2.shape
print vstack([c1, c2], format='csr')

#prints:
(1, 2) (1, 2)
  (0, 0)    1
  (0, 1)    2
  (1, 0)    3
  (1, 1)    4

However, my code fails:

from scipy.sparse import csr_matrix,vstack
import numpy as np
y_train = np.array([1, 0, 1, 0, 1, 0])
X_train = csr_matrix([[1, 1], [-1, 1], [1, 0], [-1, 0], [1, -1], [-1, -1]])

c0 = X_train[y_train == 0].mean(axis=0)
c1 = X_train[y_train == 1].mean(axis=0)

print c0.shape, c1.shape #prints (1L, 2L) (1L, 2L)
print c0,c1 #prints [[-1.  0.]] [[ 1.  0.]]
print vstack([c0,c1], format='csr')

The last line raises exception -

File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 484, in vstack
return bmat([[b] for b in blocks], format=format, dtype=dtype)

File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 533, in bmat
raise ValueError('blocks must be 2-D') ValueError: blocks must be 2-D

I guess using mean has something to do with out. Any ideas?

1
prints [[-1. 0.]] [[ 1. 0.]] - that's not how sparce matrices are printed. Those are dense.hpaulj
@hpaulj Yep, It was a bit weird ... I noticed that finallyomerbp
mean and sum are performed by dot multiply with a dense array (of ones) - and the result is a dense matrix. Even if there is only one nonzero value in a row, that row sum will be nonzero.hpaulj

1 Answers

2
votes

Taking the mean of a sparse matrix returns a NumPy matrix (which is not sparse). So c0 and c1 are matrices:

In [76]: type(c0)
Out[76]: numpy.matrixlib.defmatrix.matrix

In [89]: sparse.issparse(c0)
Out[94]: False

vstack expects its first argument to be a sequence of sparse matrices. So make (at least) the first matrix a sparse matrix:

In [31]: vstack([coo_matrix(c0), c1])
Out[31]: 
<2x2 sparse matrix of type '<type 'numpy.float64'>'
    with 2 stored elements in COOrdinate format>

In [32]: vstack([coo_matrix(c0), c1]).todense()
Out[32]: 
matrix([[-1.,  0.],
        [ 1.,  0.]])