0
votes

I have had zero luck finding this elsewhere on the site, so here's my problem. I loop through about a thousand mat files, each with about 10,000 points of data. I'm trying to create an overall histogram of this data, but it's not very feasible to concatenate all this data to give to hist.

I was hoping to be able to create an N and Bin variable each loop using hist (y), then N and Bin would be recalculated on the next loop iteration by using hist(y_new). And so on and so on. That way the source data doesn't grow and when the loop finally ends, I can just use bar(). If this method wouldn't work, then I am very open-minded to other solutions.

Also, it is probably not safe to assume that the x data will remain constant throughout each iteration. I'm using 2012a.

Thanks for any help!!

1
Look at histc instead of hist. Why is your suggested approach not working? It's unclear what you've tried and why it doesn't give the desired results...Wolfie
I'm not too familiar with histc. It looks like it takes predefined edges and a data vector. I'll take a look into that. And my method might work, I just don't know the proper way for actually combining the N and Bin variables. It's definitely not just the average between the two or something simple. I'm just unaware of how to do it.user3014597
If you provide consistent bins every time you call histc, then you can just add the next results into the bin totals... So you have one edges array, one bin totals array for those edges, and each step you add the bin totals for the next data set...Wolfie
Yeah unfortunately that's the problem, the bins change. It's not feasible to run through all my mat files to create the edges and then go back through and bar everything. That's why I was hoping for a way to dynamically update the hist. I'll keep chugging along.user3014597
The max and min values might change, but just pick a bin width up front and add bins as necessaryWolfie

1 Answers

0
votes

I think the best solution here is to loop through your files twice: once to set the bins and once to do the histogram. But, if this is impossible in your case, here's a one shot solution that requires you to set the bin width beforehand.

clear; close all;
rng('default') % for reproducibility

% make example data
N = 10; % number of data files
M = 5; % length of data files
xs = cell(1,N);
for i = 1:N
    xs{i} = trnd(1,1,M);
end

% parameters
width = 2;

% main
for i = 1:length(xs)
    x = xs{i}; % "load data"
    range = [min(x) max(x)];
    binsPos = 0:width:range(2)+width;
    binsNeg = fliplr( 0:-width:range(1)-width );
    newBins = [binsNeg(1:end-1) binsPos];
    newCounts = histc(x, newBins);
    newCounts(end) = []; % last bin should always be zero, see help histc

    if i == 1
        counts = newCounts;
        bins = newBins;
    else
        % combine new and old counts
        allBins = min(bins(1), newBins(1)) : width : max(bins(end), newBins(end));
        allCounts = zeros(1,length(allBins)-1);
        allCounts(find(allBins==bins(1)) : find(allBins==bins(end-1))) = counts;
        allCounts(find(allBins==newBins(1)) : find(allBins==newBins(end-1))) = ...
            allCounts(find(allBins==newBins(1)) : find(allBins==newBins(end-1)))  + newCounts;

        bins = allBins;
        counts = allCounts;
    end
end

% check
figure
bar(bins(1:end-1) + width/2, counts)

xFull = [xs{:}];
[fullCounts] = histc(xFull, bins);
fullCounts(end) = [];
figure
bar(bins(1:end-1) + width/2, fullCounts)