When applying this method:
%% When an outlier is considered to be more than three standard deviations away from the mean, use the following syntax to determine the number of outliers in each column of the count matrix:
mu = mean(data)
sigma = std(data)
[n,p] = size(data);
% Create a matrix of mean values by replicating the mu vector for n rows
MeanMat = repmat(mu,n,1);
% Create a matrix of standard deviation values by replicating the sigma vector for n rows
SigmaMat = repmat(sigma,n,1);
% Create a matrix of zeros and ones, where ones indicate the location of outliers
outliers = abs(data - MeanMat) > 3*SigmaMat;
% Calculate the number of outliers in each column
nout = sum(outliers)
% To remove an entire row of data containing the outlier
data(any(outliers,2),:) = []; %% this line
The last line removes a certain number of observations(rows) from my dataset. I however get a problem later in my programme because I have manually stated the number of observations (rows) as 1000.
%% generate sample data
K = 6;
numObservarations = 1000;
dimensions = 3;
If I change numObservarations
to data
I get a scalar output error however if I dont change it, due to the number of rows mismatching I get this error:
??? Error using ==> minus
Matrix dimensions must agree.
Error in ==> datamining at 106
D(:,k) = sum( ((data -
repmat(clusters(k,:),numObservarations,1)).^2), 2);
Is there a way to set numObservarations
so it automatically detects the amount of rows in data
and outputs that as just a number?