I want to concatenate two csr_matrix, each with shape=(1,N)
.
I know I should use scipy.sparse.vstack
:
from scipy.sparse import csr_matrix,vstack
c1 = csr_matrix([[1, 2]])
c2 = csr_matrix([[3, 4]])
print c1.shape,c2.shape
print vstack([c1, c2], format='csr')
#prints:
(1, 2) (1, 2)
(0, 0) 1
(0, 1) 2
(1, 0) 3
(1, 1) 4
However, my code fails:
from scipy.sparse import csr_matrix,vstack
import numpy as np
y_train = np.array([1, 0, 1, 0, 1, 0])
X_train = csr_matrix([[1, 1], [-1, 1], [1, 0], [-1, 0], [1, -1], [-1, -1]])
c0 = X_train[y_train == 0].mean(axis=0)
c1 = X_train[y_train == 1].mean(axis=0)
print c0.shape, c1.shape #prints (1L, 2L) (1L, 2L)
print c0,c1 #prints [[-1. 0.]] [[ 1. 0.]]
print vstack([c0,c1], format='csr')
The last line raises exception -
File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 484, in vstack
return bmat([[b] for b in blocks], format=format, dtype=dtype)File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 533, in bmat
raise ValueError('blocks must be 2-D') ValueError: blocks must be 2-D
I guess using mean
has something to do with out.
Any ideas?
prints [[-1. 0.]] [[ 1. 0.]]
- that's not how sparce matrices are printed. Those are dense. – hpauljmean
andsum
are performed bydot
multiply with a dense array (of ones) - and the result is a dense matrix. Even if there is only one nonzero value in a row, that row sum will be nonzero. – hpaulj