I have tried k-NN classification using the toy data and got the predictions as below:
actual <- c(rep('A1',12), rep('A2',12), rep('A3',7), rep('A4',12), rep('B1',11), rep('B2',17), rep('C1',15))
prediction <- c('A1','A1','A1','A1','A1','A3','A4','A4','B1','B2','C1','C1',
'A2','A2','A2','A2','A2','A3','A4','A4','A4','B1','B1','C1',
'A1','A2','A3','A3','A3','A3','B2',
'A1','A1','A2','A2','A2','A4','A4','A4','A4','A4','A4','B1',
'A1','A2','A2','A4','B1','B1','B1','B2','B2','B2','B2',
'A1','A3','B1','B1','B1','B1','B2','B2','B2','B2','B2','B2','B2','B2','B2','C1','C1',
'A1','A1','A2','B2','B2','C1','C1','C1','C1','C1','C1','C1','C1','C1','C1')
Basic idea about the predictions can be achieved using the table()
as:
table(actual, prediction)
# prediction
# actual A1 A2 A3 A4 B1 B2 C1
# A1 5 0 1 2 1 1 2
# A2 0 5 1 3 2 0 1
# A3 1 1 4 0 0 1 0
# A4 2 3 0 6 1 0 0
# B1 1 2 0 1 3 4 0
# B2 1 0 1 0 4 9 2
# C1 2 1 0 0 0 2 10
There is a much informative function caret::confusionMatrix()
.
caret::confusionMatrix(prediction, actual)
# Confusion Matrix and Statistics
#
# Reference
# Prediction A1 A2 A3 A4 B1 B2 C1
# A1 5 0 1 2 1 1 2
# A2 0 5 1 3 2 0 1
# A3 1 1 4 0 0 1 0
# A4 2 3 0 6 1 0 0
# B1 1 2 0 1 3 4 0
# B2 1 0 1 0 4 9 2
# C1 2 1 0 0 0 2 10
#
# Overall Statistics
#
# Accuracy : 0.4884
# 95% CI : (0.379, 0.5986)
# No Information Rate : 0.1977
# P-Value [Acc > NIR] : 1.437e-09
#
# Kappa : 0.3975
# Mcnemar's Test P-Value : NA
#
# Statistics by Class:
#
# Class: A1 Class: A2 Class: A3 Class: A4 Class: B1 Class: B2 Class: C1
# Sensitivity 0.41667 0.41667 0.57143 0.50000 0.27273 0.5294 0.6667
# Specificity 0.90541 0.90541 0.96203 0.91892 0.89333 0.8841 0.9296
# Pos Pred Value 0.41667 0.41667 0.57143 0.50000 0.27273 0.5294 0.6667
# Neg Pred Value 0.90541 0.90541 0.96203 0.91892 0.89333 0.8841 0.9296
# Prevalence 0.13953 0.13953 0.08140 0.13953 0.12791 0.1977 0.1744
# Detection Rate 0.05814 0.05814 0.04651 0.06977 0.03488 0.1047 0.1163
# Detection Prevalence 0.13953 0.13953 0.08140 0.13953 0.12791 0.1977 0.1744
# Balanced Accuracy 0.66104 0.66104 0.76673 0.70946 0.58303 0.7067 0.7981
I have observed that there are many subclasses belong to another class. For example, A1
, A2
, A3
, A4
belong to class A
. Similarly B1
, B2
belong to class B
. I would like to calculate the statistics after treating all the subclasses within a class as equal. Is there any function available to generate similar statistics as above for within class and out-class errors?
Note: Please do not propose solutions which contain removing the numbers from the subclasses, as the real application is not similar to this. For simplicity purpose, I have given this example.
Is it possible to take get the solution if the class and subclass relations are given?