8
votes

In a Python script I'm writing I am simulating multivariate normal random vectors with the expression

np.random.multivariate_normal(np.zeros(dim_obs), y_cov)

My script runs, but generates the following warning:

RuntimeWarning: covariance is not positive-semidefinite.

Also the little debug print statements I throw in there print False most of the time

print( np.all(np.linalg.eigvals(y_cov) > 0) )

Why is this throwing false positives? My y_cov is positive semi-definite because it is (sorry about the lack of TeX markup) B x x'B' + y y' where the B is a matrix, and the others are random vectors with each element positive.

In this particular run B is actually just a ones vector of size 9. Can I just ignore this warning? From the documentation:

Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of this method is undefined and backwards compatibility is not guaranteed.

Edit: here's a runnable thing altogether. Thanks for the tip @user2357112.

import numpy as np
num_factors = 1
dim_obs = 9
u = np.random.normal(size = num_factors)
v = np.random.normal(size = dim_obs)
y_cov = np.dot(np.ones((9,1)), np.exp(u.reshape((num_factors,1))/2))
y_cov = np.dot(y_cov, np.exp(u.reshape((1,num_factors))/2)) #transpose
y_cov = np.dot(y_cov, np.transpose(np.ones((9,1))))
y_cov += np.dot(np.exp( v.reshape((dim_obs,1)) / 2), 
                np.exp( v.reshape((1,dim_obs)) / 2))
print( np.random.multivariate_normal(np.zeros(dim_obs), y_cov) )
print( np.all(np.linalg.eigvals(y_cov) > 0) )
print( np.linalg.eigvals(y_cov)  )
2
Instead of NumPy being wrong, have you considered that, say, your y_cov might not be what you think it is? Show us runnable code that demonstrates the problem when run. These two lines of code aren't enough to diagnose the problem.user2357112 supports Monica
Aside: some of your reshape/dot computations would be more clearly expressed as the outer product of vectors.user6655984

2 Answers

15
votes

Theoretically, your matrix is positive semidefinite, with several eigenvalues being exactly zero. But the computations with floating point numbers introduce truncation errors which result in some of those eigenvalues being very small but negative; hence, the matrix is not positive semidefinite.

For the time being, it looks like the warning may be ignored; but NumPy documentation says that the behavior in non-psd case is undefined, so I would not want to rely on this. A way to correct for the floating point errors is to add a tiny multiple of the identity matrix to y_cov. For example, like this:

min_eig = np.min(np.real(np.linalg.eigvals(y_cov)))
if min_eig < 0:
    y_cov -= 10*min_eig * np.eye(*y_cov.shape)

Adding a fixed multiple of identity, like 1e-12, would work for all reasonable size matrices and still wouldn't matter for the results.


For completeness, a simpler way to reproduce the issue:

import numpy as np
x = np.random.normal(size=(5,))
y = np.outer(x, x)
z = np.random.multivariate_normal(np.zeros(5), y)    

This throws the same warning (with high probability).

5
votes

A more efficient way to generate the Gaussian samples in your case, which is also immune to the numerical issues identified by @zaq, is to observe that a multivariate, zero mean Gaussian random vector with covariance matrix equal to a*a.T + b*b.T (a, b: column vectors) is equal in distribution to the random vector a*w1 + b*w2 where w1 and w2 are independent Gaussian scalar random variables of zero mean and variance 1