Cross validation and ROC curve using Matlab: how plot mean ROC curve?

Question

I am using k-fold cross validation with k = 10. Thus, I have 10 ROC curves. I would like to average between the curves. I can't just average the values on the Y axes (using perfcurve) because the vectors returned are not the same size.

[X1,Y1,T1,AUC1] = perfcurve(t_test(1),resp(1),1);
.
.
.
[X10,Y10,T10,AUC10] = perfcurve(t_test(10),resp(10),1);

How to solve this? How can I plot the average curve of the 10 ROC curves?

saastn saastn · Accepted Answer · 2020-10-16T22:22:41

So, you have k curves with different number of points, all bound in [0..1] interval in both dimensions. First, you need to calculate interpolated values for each curve at specified query points. Now you have new curves with fixed number of points and can compute their mean. The interp1 function will do the interpolation part.

%% generating sample data
k = 10;
X = cell(k, 1);
Y = cell(k, 1);
hold on;
for i=1:k
    n = 10+randi(10);
    X{i} = sort([0 1 rand(1, n)]);
    Y{i} = sort([0 1 rand(1, n)].^.5);
end

%% Calculating interpolations
% location of query points
X2 = linspace(0, 1, 50);
n = numel(X2);
% initializing values for different curves at different query points
Y2 = zeros(k, n);
for i=1:k
    % finding interpolated values for i-th curve
    Y2(i, :) = interp1(X{i}, Y{i}, X2);
end
% finding the mean
meanY = mean(Y2, 1);

Notice that different interpolation methods can affect your results. For example, the ROC plot data are kind of stairs data. To find the exact values on such curves, you should use the Previous Neighbor Interpolation method, instead of the Linear Interpolation which is the default method of interp1:

Y2(i, :) = interp1(X{i}, Y{i}, X2); % linear
Y3(i, :) = interp1(X{i}, Y{i}, X2, 'previous');

This is how it affects the final results:

Cross validation and ROC curve using Matlab: how plot mean ROC curve?

2 Answers