I have a 24866-by-13 matrix of one's and zero's and wanted to discover biclusters from it. I tried sci-kit learn's spectral co-clustering and spectral biclustering but both of them return the error "ValueError: array must not contain infs or NaNs."
The matrix is stored as a NumPy array, and I verified that it indeed only contains one's or zero's and no infs or NaNs. The error messages for the spectral co-clustering are:
>>> RNAiDf = pd.read_table(dfFile, index_col=0)
>>> RNAiDf.head()
HBEC30 H1155 HCC366 H1819 HCC44 HCC4017 H1993 H460 H2073 \
22848 1 0 0 0 0 1 0 0 0
9625 0 0 0 0 0 0 0 0 0
25 0 0 1 0 0 0 0 0 0
27 0 0 0 0 0 0 0 0 0
10188 0 0 1 0 0 0 0 0 1
H2009 H2122 H1395 HCC95
22848 0 1 0 0
9625 0 1 0 0
25 0 0 0 1
27 0 0 0 0
10188 1 0 0 0
>>> RNAiMatrix = RNAiDf.values
>>> RNAiMatrix.shape
(24866, 13)
>>> model = bicluster.SpectralCoclustering()
>>> model.fit(RNAiMatrix)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 122, in fit
self._fit(X)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 271, in _fit
u, v = self._svd(normalized_data, n_sv, n_discard=1)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 135, in _svd
**kwargs)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 296, in randomized_svd
Q = randomized_range_finder(M, n_random, n_iter, random_state)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 229, in randomized_range_finder
Q, R = linalg.qr(Y, mode='economic')
File "/work/anaconda3/lib/python3.4/site-packages/scipy/linalg/decomp_qr.py", line 127, in qr
a1 = numpy.asarray_chkfinite(a)
File "/work/anaconda3/lib/python3.4/site-packages/numpy/lib/function_base.py", line 668, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
and for the spectral biclustering:
>>> model = bicluster.SpectralBiclustering()
>>> model.fit(RNAiMatrix)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 122, in fit
self._fit(X)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 440, in _fit
u, v = self._svd(normalized_data, n_sv, n_discard)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 135, in _svd
**kwargs)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 296, in randomized_svd
Q = randomized_range_finder(M, n_random, n_iter, random_state)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 229, in randomized_range_finder
Q, R = linalg.qr(Y, mode='economic')
File "/work/anaconda3/lib/python3.4/site-packages/scipy/linalg/decomp_qr.py", line 127, in qr
a1 = numpy.asarray_chkfinite(a)
File "/work/anaconda3/lib/python3.4/site-packages/numpy/lib/function_base.py", line 668, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
The error seems to stem from computing the QR decomposition of some matrix. The parameter for setting the number of biclusters doesn't seem to make any difference. I am using scikit-learn version 0.16.1, and none of the columns are constant. Any tips on what might be going wrong? Thanks in advance.
RNAiDf = RNAiDf.dropna()
after reading? – MaxNoe