To simplify my question, I create a dummy problem here: I have two sets of training data that are labelled with 1 and 2 respectively. Both training datasets assumed to follow mixture of Gaussian distribution. I can easily use Matlab toolbox function (gmdistribution.fit) to estimate their mean and covariance.
Then I have some testing dataset that assumed to be created with an MoG similar to training dataset 2, but with noise. I would like to calculate something like a likelihood probability that my testing dataset is more likely to be generated using the MoG of training dataset 2. In other words, I would like to get the likelihood of my testing dataset to have the label 2.
Could you please point a direction how to do this? Thanks very much.
N.B.:
- The sizes of my two training datasets are different.
- The distributions of the two training datasets are overlapped.
- The size of the testing dataset is much smaller than the training datasets.
Some Matlab codes:
%% Mixture of Gassian 1 (Training set 1)
mean1 = [1 -2];
cov1 = [2 0; 0 .5];
mean2 = [0.5 -5];
cov2 = [1 0; 0 1];
trainingDataset1 = [mvnrnd(mean1, cov1, 1000); mvnrnd(mean2, cov2, 1000)];
MoGOptions = statset('Display', 'final');
MoGObj1 = gmdistribution.fit(trainingDataset1, 2, 'Options', MoGOptions);
figure,
scatter(trainingDataset1(:,1), trainingDataset1(:,2), 10, '.')
hold on
ezcontour(@(x,y)pdf(MoGObj1,[x y]), [-8 6], [-8 2]);
%% Mixture of Gassian 2 (Training set 2)
mean4 = [0.5 -1];
cov4 = [1.5 0; 0 .8];
mean5 = [-2 -3];
cov5 = [1 0; 0 1];
mean6 = [-4 -2];
cov6 = [1 0; 0 1];
trainingDataset2 = [mvnrnd(mean4, cov4, 500); mvnrnd(mean5, cov5, 500); mvnrnd(mean6, cov6, 500)];
MoGOptions = statset('Display', 'final');
MoGObj2 = gmdistribution.fit(trainingDataset2, 2, 'Options', MoGOptions);
figure,
scatter(trainingDataset2(:,1), trainingDataset2(:,2), 10, '.')
hold on
ezcontour(@(x,y)pdf(MoGObj2,[x y]), [-8 6], [-8 2]);
%% Test set
mean7 = [1.1 -2.1];
cov7 = [2.2 0; 0 .4];
mean8 = [0.3 -5.4];
cov8 = [1.2 0; 0 1.1];
testingDataset1 = [mvnrnd(mean7, cov7, 100); mvnrnd(mean8, cov8, 100)];
figure,
scatter(testingDataset1(:,1), testingDataset1(:,2), 10, '.')