0
votes

In matlab, say I have the following data:

data = [4 0.1; 6 0.5; 3 0.8; 2 1.4; 7 1.6; 12 1.8; 9 1.9; 1 2.3; 5 2.5; 5 2.6];

I want to place the 1st column into bins according to elements in the 2nd column (i.e. 0-1, 1-2, 2-3...), and calculate the mean and 95% confidence interval of the elements in column 1 within that bin . So I'd have a matrix something like this:

mean   lower_95%   upper_95%    bin
4.33                            0
7.5                             1
3.67                            2
2

2 Answers

2
votes

You can use accumarray with the appropriate function for the mean (mean) or the quantiles (quantile):

m = accumarray(floor(data(:,2))+1, data(:,1), [], @mean);
l = accumarray(floor(data(:,2))+1, data(:,1), [], @(x) quantile(x,.05));
u = accumarray(floor(data(:,2))+1, data(:,1), [], @(x) quantile(x,.95));
result = [m l u (0:numel(m)-1).'];

This can also be done calling accumarray once with cell array output:

result = accumarray(floor(data(:,2))+1, data(:,1), [],...
    @(x) {[mean(x) quantile(x,.05) quantile(x,.95)]});
result = cell2mat(result);

For your example data,

result =
    4.3333    3.0000    6.0000         0
    7.5000    2.0000   12.0000    1.0000
    3.6667    1.0000    5.0000    2.0000
1
votes

This outputs a matrix with the labelled columns. Note that for your example data, 2 standard deviations from the mean (for the 95% confidence interval) gives values outside of the bands. With a larger (normally distributed) data set, you wouldn't see this.

Your data:

data = [4 0.1; 6 0.5; 3 0.8; 2 1.4; 7 1.6; 12 1.8; 9 1.9; 1 2.3; 5 2.5; 5 2.6];

Binning for output table:

% Initialise output matrix. Columns:
% Mean, lower 95%, upper 95%, bin left, bin right 
bins = [0 1; 1 2; 2 3];
out = zeros(size(bins,1),5);
% Cycle through bins
for ii = 1:size(bins,1)
    % Store logical array of which elements fit in given bin
    % You may want to include edge case for "greater than or equal to" leftmost bin. 
    % Alternatively you could make the left bin equal to "left bin - eps" = -eps
    bin = data(:,2) > bins(ii,1) & data(:,2) <= bins(ii,2);
    % Calculate mean, and mean +- 2*std deviation for confidence intervals
    out(ii,1) = mean(data(bin,2));
    out(ii,2) = out(ii,1) - 2*std(data(bin,2));
    out(ii,3) = out(ii,1) + 2*std(data(bin,2));
end
% Append bins to the matrix 
out(:,4:5) = bins;

Output:

out =

0.4667   -0.2357    1.1690         0    1.0000
1.6750    1.2315    2.1185    1.0000    2.0000
2.4667    2.1612    2.7722    2.0000    3.0000