3
votes

I am working with 3-dimensional numpy arrays, on which I will ultimately perform PCA. I first flatten the 3-D arrays into 2-D so that I can calculate covariance (and then eigenvalues and eigenvectors).

In calculating the covariance matrix I get different results using numpy.cov vs. numpy.dot. If my 2-D array is (5,9) I want to end up with a 5x5 (i.e., NxN) covariance matrix. This is what I get using numpy.dot. With numpy.cov, I end up with a covariance matrix that is 9x9. This does not fit the shape of what I need, but honestly I don't know which one is correct. I have seen both approaches used for calculating covariance in the examples I've studied.

If I carry the numpy.dot vs. numpy.cov through the numpy.linalg.eig calculation, I obviously get different answers (all printed below in the example output). So, I'm pretty confused at this point about which approach is correct, or where I may be going wrong.

Here is the test code with output. Thanks for any help.

import numpy as np

a = np.random.random(((5,3,3))); # example of what real input will look like

# create 2D flattened version of 3D input array
d1,d2,d3 = a.shape
b = np.zeros([d1,d2*d3])
for i in range(len(a)):
  b[i] = a[i].flatten()

print "shape of 3D array: ", a.shape
print "shape of flattened 2D array: ", b.shape, "\n"
print "flattened 2D array:\n", b, "\n"

# mean-center the flattened array
b -= np.mean(b, axis=0)

# calculate the covariance matrix of the flattened array
covar1 = np.cov(b, rowvar=0)   # this makes a 9x9 array
covar2 = np.dot(b, b.T)        # this makes a 5x5 array

print "covariance via numpy.cov:\n", covar1, "\n"
print "covariance via numpy.dot:\n", covar2, "\n"

# calculate eigenvalues and eigenvectors
eval1, evec1 = np.linalg.eig(covar1)
eval2, evec2 = np.linalg.eig(covar2)

print "eigenvalues via numpy.cov covariance matrix:\n", eval1, "\n"
print "eigenvectors via numpy.cov covariance matrix:\n", evec1, "\n"
print "eigenvalues via numpy.dot covariance matrix:\n", eval2, "\n"
print "eigenvectors via numpy.dot covariance matrix:\n", evec2, "\n"


======= Output =======

shape of 3D array:  (5, 3, 3)
shape of flattened 2D array:  (5, 9)

flattened 2D array:
[[ 0.94964127  0.71015973  0.80994774  0.49727821  0.38270025  0.89136202
   0.19876615  0.72461047  0.43646456]
 [ 0.00502329  0.70593521  0.44001479  0.97576486  0.37261107  0.6318449
   0.86301405  0.21820704  0.91507706]
 [ 0.75411747  0.98462782  0.65109776  0.1083943   0.12867679  0.63172813
   0.85803498  0.89507165  0.62291308]
 [ 0.88589874  0.02797773  0.6421045   0.17255432  0.5713524   0.28589519
   0.55888288  0.7961657   0.4453764 ]
 [ 0.85774793  0.19511453  0.92167001  0.27340606  0.41849435  0.98349776
   0.19354437  0.2974041   0.52064868]]

covariance via numpy.cov():
[[ 0.15180806 -0.04977355  0.05733885 -0.11340765  0.00840097  0.01461576
  -0.08596712  0.07512366 -0.07509614]
 [-0.04977355  0.15853367 -0.02337953  0.0357429  -0.05604085  0.02600021
   0.06158462  0.0229808   0.03506849]
 [ 0.05733885 -0.02337953  0.0335786  -0.03485899  0.00294469  0.03209583
  -0.05378417  0.00490397 -0.02751816]
 [-0.11340765  0.0357429  -0.03485899  0.12340238  0.0052609   0.0144986
   0.02494029 -0.07492008  0.05109007]
 [ 0.00840097 -0.05604085  0.00294469  0.0052609   0.02529647 -0.01263607
  -0.02327657 -0.01136774 -0.01037048]
 [ 0.01461576  0.02600021  0.03209583  0.0144986  -0.01263607  0.07415853
  -0.05387152 -0.0345835  -0.00342481]
 [-0.08596712  0.06158462 -0.05378417  0.02494029 -0.02327657 -0.05387152
   0.11053971  0.00903926  0.04727671]
 [ 0.07512366  0.0229808   0.00490397 -0.07492008 -0.01136774 -0.0345835
   0.00903926  0.09436665 -0.03526195]
 [-0.07509614  0.03506849 -0.02751816  0.05109007 -0.01037048 -0.00342481
   0.04727671 -0.03526195  0.03900974]]

covariance via numpy.dot():
[[ 0.3211555  -0.34304471 -0.01453859 -0.1071505   0.14357829]
 [-0.34304471  1.24506647 -0.11174019 -0.43907983 -0.35120174]
 [-0.01453859 -0.11174019  0.57018674 -0.10412646 -0.3397815 ]
 [-0.1071505  -0.43907983 -0.10412646  0.60465919  0.0456976 ]
 [ 0.14357829 -0.35120174 -0.3397815   0.0456976   0.50170735]]

eigenvalues via numpy.cov covariance matrix:
[  3.34339027e-01 +0.00000000e+00j   1.98268985e-01 +0.00000000e+00j
   5.71434551e-02 +0.00000000e+00j   1.13399310e-01 +0.00000000e+00j
   3.38418299e-18 +1.46714498e-17j   3.38418299e-18 -1.46714498e-17j
   1.20944017e-18 +0.00000000e+00j  -8.89005842e-18 +0.00000000e+00j
  -6.59244508e-18 +0.00000000e+00j]

eigenvectors via numpy.cov covariance matrix:
[[-0.33898927+0.j          0.01567746+0.j         -0.32410513+0.j
   0.01868249+0.j          0.03901578-0.09858459j  0.03901578+0.09858459j
  -0.17596347+0.j          0.08294235+0.j          0.04883282+0.j        ]
 [ 0.03740184+0.j         -0.01106985+0.j          0.11199662+0.j
  -0.36257285+0.j          0.66513867+0.j          0.66513867+0.j
   0.34810753+0.j         -0.05174886+0.j         -0.21147240+0.j        ]
 [ 0.42193056+0.j          0.10153367+0.j         -0.52774125+0.j
  -0.57292678+0.j         -0.02584078-0.15425679j -0.02584078+0.15425679j
  -0.02594397+0.j         -0.23132722+0.j         -0.33824532+0.j        ]
 [-0.08723679+0.j         -0.17700647+0.j         -0.04490487+0.j
   0.14531440+0.j         -0.08669754+0.21485879j -0.08669754-0.21485879j
  -0.73208352+0.j          0.04474123+0.j         -0.09159437+0.j        ]
 [-0.26991334+0.j          0.39182156+0.j          0.18023454+0.j
  -0.14727224+0.j         -0.21261400+0.1100362j  -0.21261400-0.1100362j
   0.15211635+0.j          0.54168898+0.j         -0.36386803+0.j        ]
 [-0.39361702+0.j          0.48389127+0.j          0.12668909+0.j
   0.07739853+0.j          0.31569702-0.34166187j  0.31569702+0.34166187j
   0.11287735+0.j         -0.74889136+0.j         -0.42472067+0.j        ]
 [-0.29962418+0.j         -0.01577641+0.j          0.35742257+0.j
  -0.68969822+0.j         -0.28182091+0.13998238j -0.28182091-0.13998238j
  -0.40124817+0.j          0.06419507+0.j          0.47506061+0.j        ]
 [-0.57032501+0.j         -0.60505095+0.j         -0.30688172+0.j
  -0.11823642+0.j          0.07618472-0.0915626j   0.07618472+0.0915626j
   0.32272841+0.j         -0.10872383+0.j         -0.25867852+0.j        ]
 [-0.23498699+0.j          0.45164240+0.j         -0.57569388+0.j
   0.03856674+0.j         -0.07478874+0.27512969j -0.07478874-0.27512969j
  -0.10101603+0.j          0.25440413+0.j          0.47403650+0.j        ]]

eigenvalues via numpy.dot covariance matrix:
[  1.33735611e+00   7.93075942e-01   2.08276008e-16   4.53597239e-01
   2.28573820e-01]

eigenvectors via numpy.dot covariance matrix:
[[ 0.1223889  -0.87441162 -0.4472136  -0.13172011  0.05545353]
 [-0.54658696  0.08157704 -0.4472136   0.61361759  0.34360056]
 [ 0.70163289  0.24699239 -0.4472136   0.41717057 -0.26958257]
 [-0.41754523  0.17603863 -0.4472136  -0.33135976 -0.69632398]
 [ 0.1401104   0.36980356 -0.4472136  -0.56770828  0.56685246]]
1

1 Answers

4
votes

np.dot is just the matrix product of the two matrices. That's not the covariance. Why are you using rowvar=0? If you just do np.cov(b) it gives a matrix of the right dimensions.