Sampling parts of a vector from gaussian mixture model

Question

I want to sample only some elements of a vector from a sum of gaussians that is given by their means and covariance matrices.

Specifically:

I'm imputing data using gaussian mixture model (GMM). I'm using the following procedure and sklearn:

impute with mean
get means and covariances with GMM (for example 5 components)
take one of the samples and sample only the missing values. the other values stay the same.
repeat a few times

There are two problems that I see with this. (A) how do I sample from the sum of gaussians, (B) how do I sample only part of the vector. I assume both can be solved at the same time. For (A), I can use rejection sampling or inverse transform sampling but I feel that there is a better way utilizing multivariate normal distribution generators in numpy. Or, some other efficient method. For (B), I just need to multiply the sampled variable by a gaussian that has known values from the sample as an argument. Right?

I would prefer a solution in python but an algorithm or pseudocode would be sufficient.

Stackoverflow is primarily a programming site. You might try to get your algorithm at stats.stackexchange.com and then come back here if you need assistance programming it. — John1024

kirill_igum kirill_igum · Accepted Answer · 2014-10-25T17:18:49

Since for sampling only relative proportion of the distribution matters, scaling preface or can be thrown away. For diagonal covariance matrix, one can just use the covariance submarine and mean subvector that has dimensions of missing data. For covariance with off-diagonal elements, the mean and std dev of a sampling gaussian will need to be changed.

Sampling parts of a vector from gaussian mixture model

2 Answers