0
votes

I have a 24866-by-13 matrix of one's and zero's and wanted to discover biclusters from it. I tried sci-kit learn's spectral co-clustering and spectral biclustering but both of them return the error "ValueError: array must not contain infs or NaNs."

The matrix is stored as a NumPy array, and I verified that it indeed only contains one's or zero's and no infs or NaNs. The error messages for the spectral co-clustering are:

>>> RNAiDf = pd.read_table(dfFile, index_col=0)
>>> RNAiDf.head()
       HBEC30  H1155  HCC366  H1819  HCC44  HCC4017  H1993  H460  H2073  \
22848       1      0       0      0      0        1      0     0      0   
9625        0      0       0      0      0        0      0     0      0   
25          0      0       1      0      0        0      0     0      0   
27          0      0       0      0      0        0      0     0      0   
10188       0      0       1      0      0        0      0     0      1   

       H2009  H2122  H1395  HCC95  
22848      0      1      0      0  
9625       0      1      0      0  
25         0      0      0      1  
27         0      0      0      0  
10188      1      0      0      0  
>>> RNAiMatrix = RNAiDf.values
>>> RNAiMatrix.shape
(24866, 13)
>>> model = bicluster.SpectralCoclustering()
>>> model.fit(RNAiMatrix)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 122, in fit
    self._fit(X)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 271, in _fit
    u, v = self._svd(normalized_data, n_sv, n_discard=1)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 135, in _svd
    **kwargs)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 296, in randomized_svd
    Q = randomized_range_finder(M, n_random, n_iter, random_state)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 229, in randomized_range_finder
    Q, R = linalg.qr(Y, mode='economic')
  File "/work/anaconda3/lib/python3.4/site-packages/scipy/linalg/decomp_qr.py", line 127, in qr
    a1 = numpy.asarray_chkfinite(a)
  File "/work/anaconda3/lib/python3.4/site-packages/numpy/lib/function_base.py", line 668, in asarray_chkfinite
    "array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

and for the spectral biclustering:

>>> model = bicluster.SpectralBiclustering()
>>> model.fit(RNAiMatrix)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 122, in fit
    self._fit(X)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 440, in _fit
    u, v = self._svd(normalized_data, n_sv, n_discard)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 135, in _svd
    **kwargs)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 296, in randomized_svd
    Q = randomized_range_finder(M, n_random, n_iter, random_state)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 229, in randomized_range_finder
    Q, R = linalg.qr(Y, mode='economic')
  File "/work/anaconda3/lib/python3.4/site-packages/scipy/linalg/decomp_qr.py", line 127, in qr
    a1 = numpy.asarray_chkfinite(a)
  File "/work/anaconda3/lib/python3.4/site-packages/numpy/lib/function_base.py", line 668, in asarray_chkfinite
    "array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

The error seems to stem from computing the QR decomposition of some matrix. The parameter for setting the number of biclusters doesn't seem to make any difference. I am using scikit-learn version 0.16.1, and none of the columns are constant. Any tips on what might be going wrong? Thanks in advance.

1
You need to provide example data ad code.MaxNoe
Also, which version of scikit-learn are you using? Because in current master branch this randomized_range_finder function was changed a little bit. Maybe update on latest version will help.Ibraim Ganiev
Any constant columns maybe?Has QUIT--Anony-Mousse
I'm using version 0.16.1, and none of the columns are constant. I've also updated my post to include code.Jonathan Young
What happens if you call RNAiDf = RNAiDf.dropna() after reading?MaxNoe

1 Answers

0
votes

RNAiMatrix should be a affinity matrix which "0" reprensents that two elements are the same, I think you'd better modify this matrix, and you can follow this Using the class sklearn.cluster.SpectralClustering with parameter affinity='precomputed'