0
votes

I've been given a coding exercise to get me up to speed with MATLAB, but I'm having a bit of an issue getting out what I need.

I am trying to numerically integrate the Gaussian probability function on given intervals (variable bins). I am trying to do this by using the trapezoidal rule. I would expect the sum of all those integrals to be close to one, however what I'm getting in my results is getting smaller (for example, attaining a value of 0.5 when I set the variable bins to 100).

What I'm essentially doing is subdividing the x-axis into "bins", and taking the integral of a Gaussian PDF along the points contained within the bins. So, for example, say that I have the x-axis running from 0 to 19.9, incrementing by 0.1, so I have 200 "points". Now, say that I want to have ten bins. Then, each bin will contain 20 points, and I need to perform an integral approximation between every point in each bin.

Now, I've kind of got the following happening so far;

clear

%Sets location of the Gaussian PDF
s_1 = 5;
s_2 = 15;

%Sets the domain of the x-axis
x = [0:0.1:19.9];

%Sets the number of bins, and the size of each bin
b = 100;
points = numel(x);
binsize = (points)/b;

%Arranges all of the prescribed x-values into their necessary bins
t = reshape(x,binsize,b);

%Calculates p(r_n|s_n)
for i = 1:b
    r_s1(i) = trapz(t(:,i),normpdf(t(:,i),s_1));
    r_s2(i) = trapz(t(:,i),normpdf(t(:,i),s_2));
end

Now, for a small number of bins (say, 10), the approximation comes out quite nicely, and the sums of the elements of r_s1 and r_s2 both come out relatively close to 1 (as the integral of the Gaussian PDF should be 1). However, as I start bumping up the sizes of my bins, the value of this sum starts dropping. All things considered, I'd have expected it to stay around the same value.

Is this an error in my mathematical approach to the problem I've been posed, or have I messed up my code??

1
Are you sure you don't want to use the Normal cumulative distribution function normcdf? Also: You didn't really specify what you are actually trying to solve (the mathematics).knedlsepp
I don't really know what your actual question is, but in regards to why the values will get smaller: Integrating with the trapezoidal rule you will be quite likely overestimating the integral, as the areas where you are integrating are mostly convex parts of the function. So finer integration will give more accurate results. You should obviously not really get to 1 if you don't do integration from -Inf to Inf as the support of the GPDF is infinite. So basically your integration rule is overestimating and will become more accurate with finer bins.knedlsepp
I guess my question is really asking what I'm doing wrong, in a sense. Inputting that code with a bin size of 10 gives me a value of about 0.9. However, if increase the bin size to say, 100, I get a value closer to 0.5. My impression was that an increase in bins would lead to an increase in the accuracy of the approximation, or at the very least, not change it too dramatically. I feel like there's an error in what I've written to make the estimation drop so much.Jack
Which value is 0.5? To ask: What am I doing wrong, you must first explain what your goal is. I can't find it in your question.knedlsepp
The sum of all the elements of the r_s1 array. Same for r_s2.Jack

1 Answers

1
votes

The bins you are using don't partition the interval [0, 19.9]. Looking at the variable t:

t = 
         0    0.2000    0.4000    0.6000    ...
    0.1000    0.3000    0.5000    0.7000    ...

You see that your first bin ends at 0.1, but the second bin starts only at 0.2, so in this case you are only integrating over half the interval, ergo the result ~0.5.

The solution that changes the least of your code would probably be to add

t = [x(1), t(end,1:end-1); t];

After your t = reshape(x,binsize,b); reshape line.