Classification project: Hand gesture detector fails to detect right class

Question

Classification Project

I have been working for a while with hand detector which aim is to detect hand gesture from picture/video/webcam. I have implemented this project with python2.7 and used opencv and sklearn.

Finally I have reached to the point that hand gesture detector detects the hand with following techniques:

Transfers RBG colors to HSV colors and adjusts HSV parameters (hue, saturation, value) manually. (I did this because I have read that RBG colors is not suitable for modeling skin.)
Makes dilotion, erosion and median blurring filters to get thresholded area of skin colors.
Detects the face with Haar-Cascade from a picture and removes it because only hand part is interesting.
When face area is removed the hand area is biggest area that contains skin color. Make contours around that area.
I wanted also to get rid of the wrist in hand so I detected the center of palm with cv2.pointPolygonTest by checking maxdistance inside of palm, drawed a circle to inside of palm and got lower limit of rectangle (=cutted the wrist).
Now recognizer detects the hand from the picture

The hand detection from a picture works pretty well but my problem occurs when I am trying to recognize right class of hand gesture. I have implemented classification by training pictures with algorithms like SVM, KNN, RandomForest etc. and this is the technique which I don't want to change. I have 6 different hand poses (6 classes) and training set size is ~100pictures/class and test set size is ~10pictures/class. Training and testing pictures are made with the same technique that is discribed above and they are transformed to grayscale .bmp pictures. After that I resized pictures to same size and made .pkl dataset like MNIST model. Then I trained models with following features:

Used HOG features to improve the result.
Used sklearns StandardScaler preprosessing to improve performance.
Used many sklearn's algorithms like SVM, KNN, RandomForest, MLP, NaiveBayes, Decision Trees ... to get best model.

After training the models I got very good results for the models (predicts around 0.95-0.97 with best models) and confusion matrixes also looked good so I think that the models have learned correctly.

The Problem: The classifier classifies but most of the time wrongly. First I thought that I should increase dataset sizes but then I noticed that someone managed to recognize hand gestures with only 1 pic/class so now I think that I have done something wrong. My models should also work because this same technique worked with MNIST handwritten digits and my models classified almost every digit correctly. The problem can also be in HOG parameters and HOG is not very familiar to me. Also my dataset pictures are drawn with no space around hand poses which can implemented to result. If someone have an idea where I fail I would be very thankful.

EDIT 1: I included detectHand, generateClassifiers files to here because cloud service didn't worked. You have to take test picture of your hand with your face and adjust hsv parameters to get thresholded hand.

detectHand.py

import cv2
import numpy as np
import argparse as ap
from sklearn.externals import joblib
from skimage.feature import hog

def callback(x):
    pass

parser = ap.ArgumentParser()
parser.add_argument("-c", "--classiferPath", help="Path to Classifier File", required="True")
parser.add_argument("-i", "--image", help="Path to Image", required="True")
args = vars(parser.parse_args())
# Load the classifier
clf, pp = joblib.load(args["classiferPath"])

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

cv2.namedWindow('HSV')
# create trackbars for color change
cv2.createTrackbar('MinH','HSV',0,255, callback)   # Adjust your hand to get thresholded with HSV adjuster
cv2.createTrackbar('MaxH','HSV',25,255, callback)
cv2.createTrackbar('MinS','HSV',86,255, callback)
cv2.createTrackbar('MaxS','HSV',180,255, callback)
cv2.createTrackbar('MinV','HSV',131,255, callback)
cv2.createTrackbar('MaxV','HSV',255,255, callback)
while True:
    #read and resize image
    im = cv2.imread(args["image"])
    im = cv2.resize(im,(960,540))
    # get current position of six trackbars
    MinH = cv2.getTrackbarPos('MinH','HSV')
    MaxH = cv2.getTrackbarPos('MaxH','HSV')
    MinS = cv2.getTrackbarPos('MinS','HSV')
    MaxS = cv2.getTrackbarPos('MaxS','HSV')
    MinV = cv2.getTrackbarPos('MinV','HSV')
    MaxV = cv2.getTrackbarPos('MaxV','HSV')
    blur = cv2.blur(im,(3,3))
    # make bgr to hsv, treshold and AND operator
    hsv = cv2.cvtColor(blur, cv2.COLOR_BGR2HSV)
    lower = np.array([MinH, MinS, MinV])
    upper = np.array([MaxH, MaxS, MaxV])
    mask2 = cv2.inRange(hsv,lower,upper)

    #Kernel matrices for morphological transformation    
    kernel_square = np.ones((11,11),np.uint8)
    kernel_ellipse= cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,5))
    #Perform morphological transformations to filter out the background noise
    #Dilation increase skin color area
    #Erosion increase skin color area
    dilation = cv2.dilate(mask2,kernel_ellipse,iterations = 1)
    erosion = cv2.erode(dilation,kernel_square,iterations = 1)       
    filtered = cv2.medianBlur(erosion,5)
    ret,thresh = cv2.threshold(filtered,127,255,0)
    # detect faces from picture and remove it
    gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
    gray = cv2.equalizeHist(gray)
    faces = face_cascade.detectMultiScale(gray, 1.3, 3, minSize=(20,20), flags=cv2.CASCADE_SCALE_IMAGE)
    for (x,y,w,h) in faces:
        cv2.rectangle(thresh, (x,y),(x+h,y+w), (0,0,0), cv2.FILLED)

    im2, contours, hier = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    max_area=100
    ci=0    
    for i in range(len(contours)):
        cnt=contours[i]
        area = cv2.contourArea(cnt)
        if(area>max_area):
            max_area=area
            ci=i  

    #Largest area contour 
    im1 = im.copy()           
    cnts = contours[ci]
    rect = cv2.boundingRect(cnts)
    x1,y1,w1,h1 = rect

    #center of palm
    maxdistance=0
    pt=(0,0)
    for index_y in range(int(y1+0.25*h1),int(y1+0.8*h1)):
        for index_x in range(int(x1+0.3*w1),int(x1+0.9*w1)):
            distance=cv2.pointPolygonTest(cnts,(index_x,index_y), True)
            if(distance>maxdistance):
                maxdistance=distance
                pt = (index_x,index_y)
    cv2.circle(im1,pt,int(maxdistance),(255,0,0),2)

    cv2.rectangle(im1, (x1,y1),(x1+w1,pt[1]+int(maxdistance)), (0,0,255), 3)
    cropped_image = thresh[y1:pt[1]+int(maxdistance),x1:x1+w1]
    #edged = cv2.Canny(cropped_image, 100,200)

    roi = cv2.resize(cropped_image, (100, 150), interpolation=cv2.INTER_AREA)
    # Calculate the HOG features
    roi_hog_fd = hog(roi, orientations=9, pixels_per_cell=(5, 5), cells_per_block=(2, 2), visualise=False)
    roi_hog_fd = pp.transform(np.array([roi_hog_fd], 'float64'))
    nbr = clf.predict(roi_hog_fd)
    cv2.putText(im1, str(nbr[0]), (x1,y1),cv2.FONT_HERSHEY_DUPLEX, 2, (0, 255, 255), 3)


    cv2.imshow('Output', im1)
    cv2.imshow('Hand', cropped_image)
    cv2.imshow('roi', roi)
    c= cv2.waitKey(5)
    if c==27:
        break
cv2.destroyAllWindows()

generateClassifiers.py

#!/usr/bin/python
# Import the modules
from sklearn.externals import joblib
import pickle
from skimage.feature import hog
from sklearn import preprocessing
import numpy as np
from collections import Counter
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.dummy import DummyClassifier
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier
from sklearn.linear_model import SGDClassifier, LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.svm import LinearSVC
from sklearn.tree import DecisionTreeClassifier
import cv2

def ModelRandomQuessing(hog_features, labels, pp):
    model = "RandomQuessing"
    clf = DummyClassifier()
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model1randomquessing.pkl", compress=3)
    return (model, clf)

def ModelLinearSVM(hog_features, labels, pp):
    model = "LinearSVM"
    clf = SGDClassifier(n_jobs=-1)
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model2linearsvm.pkl", compress=3)
    return (model, clf)

def ModelKNN(hog_features, labels, pp):
    model = "KNearestNeighbors"
    clf = KNeighborsClassifier(n_jobs=-1)
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model3knn.pkl", compress=3)
    return (model, clf)

def ModelSVM(hog_features, labels, pp):
    model = "SupportVectorMachine"
    clf = SVC(kernel="rbf")
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model4svm.pkl", compress=3)
    return (model, clf)

def ModelDecisionTree(hog_features, labels, pp):
    model = "DecisionTree"
    clf = DecisionTreeClassifier()
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model5decisiontree.pkl", compress=3)
    return (model, clf)

def ModelRandomForest(hog_features, labels, pp):
    model = "RandomForest"
    clf = RandomForestClassifier()
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model6randomforest.pkl", compress=3)
    return (model, clf)

def ModelAdaboost(hog_features, labels, pp):
    model = "Adaboost"
    clf = AdaBoostClassifier()
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model7adaboost.pkl", compress=3)
    return (model, clf)

def ModelGaussianNB(hog_features, labels, pp):
    model = "GaussianNaiveBayes"
    clf = GaussianNB()
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model8gaussiannb.pkl", compress=3)
    return (model, clf)

def ModelLDA(hog_features, labels, pp):
    model = "LinearDiscriminantAnalysis"
    clf = LinearDiscriminantAnalysis()
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model9lda.pkl", compress=3)
    return (model, clf)

def ModelQDA(hog_features, labels, pp):
    model = "QuadraticDiscriminantAnalysis"
    clf = QuadraticDiscriminantAnalysis()
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model10qda.pkl", compress=3)
    return (model, clf)

def ModelLogisticRegression(hog_features, labels, pp):
    model = "LogisticRegression"
    clf = LogisticRegression(n_jobs=-1)
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model11logisticregression.pkl", compress=3)
    return (model, clf)

def ModelMLP(hog_features, labels, pp):
    model = "MultilayerPerceptron"
    clf = MLPClassifier(activation='relu',hidden_layer_sizes=(200,200),solver='lbfgs',alpha=10,verbose=True)
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model12mlp.pkl", compress=3)
    return (model, clf)

def ModelBestKNN(hog_features, labels, pp):
    model = "BestKNearestNeighbors"
    clf = KNeighborsClassifier(n_jobs=-1,weights='distance',n_neighbors=4)
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model13bestknn.pkl", compress=3)
    return (model, clf)

def ModelBestSVM(hog_features, labels, pp):
    model = "BestSupportVectorMachine"
    clf = SVC(kernel='rbf',cache_size=2000,C=10.0,gamma='auto',class_weight='balanced')
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model14bestsvm.pkl", compress=3)
    return (model, clf)

def ModelBestRandomForest(hog_features, labels, pp):
    model = "BestRandomForest"
    clf = RandomForestClassifier(n_jobs=-1,n_estimators=500,max_features='auto')
    clf.fit(hog_features, labels)
    joblib.dump((clf, pp), "model15bestrf.pkl", compress=3)
    return (model, clf)


def accuracy(modelclf, X_test, Y_test):
    model, clf = modelclf
    predicted = clf.predict(X_test)
    print("Classification report for classifier %s:\n%s\n"
      % (model, classification_report(Y_test, predicted)))
    print("Confusion matrix:\n%s" % confusion_matrix(Y_test, predicted))


if __name__=='__main__':
    # Load the dataset
    with open('handdetection.pkl', 'rb') as f:
        data = pickle.load(f)
    # Extract the features and labels
    X = data[0]
    Y = data[1]
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1)
    # Extract the hog features
    list_X_train = []
    for trainsample in X_train:
        fd, hog_image = hog(trainsample.reshape((150, 100)), orientations=9, pixels_per_cell=(5, 5), cells_per_block=(2, 2), visualise=True)
        list_X_train.append(fd)
    X_train = np.array(list_X_train, 'float64')
    # Normalize the features
    pp = preprocessing.StandardScaler().fit(X_train)
    X_train = pp.transform(X_train)
    #Same for testset
    list_X_test = []
    for testsample in X_test:
        fd = hog(testsample.reshape((150, 100)), orientations=9, pixels_per_cell=(5, 5), cells_per_block=(2, 2), visualise=False)
        list_X_test.append(fd)
    X_test = np.array(list_X_test, 'float64')
    X_test = preprocessing.StandardScaler().fit(X_test).transform(X_test)
    print ("Count of digits in dataset", Counter(Y_train))
    #accuracy(ModelRandomQuessing(X_train, Y_train, pp),X_test,Y_test)
    #accuracy(ModelLinearSVM(X_train, Y_train, pp),X_test,Y_test)
    accuracy(ModelKNN(X_train, Y_train, pp),X_test,Y_test)
    accuracy(ModelSVM(X_train, Y_train, pp),X_test,Y_test)
    #accuracy(ModelDecisionTree(X_train, Y_train, pp),X_test,Y_test)
    accuracy(ModelRandomForest(X_train, Y_train, pp),X_test,Y_test)
    #accuracy(ModelAdaboost(X_train, Y_train, pp),X_test,Y_test)
    #accuracy(ModelGaussianNB(X_train, Y_train, pp),X_test,Y_test)
    #accuracy(ModelLDA(X_train, Y_train, pp),X_test,Y_test)
    #accuracy(ModelQDA(X_train, Y_train, pp),X_test,Y_test)
    #accuracy(ModelLogisticRegression(X_train, Y_train, pp),X_test,Y_test)
    accuracy(ModelMLP(X_train, Y_train, pp),X_test,Y_test)
    #accuracy(ModelBestKNN(X_train, Y_train, pp),X_test,Y_test)
    #accuracy(ModelBestSVM(X_train, Y_train, pp),X_test,Y_test)
    #accuracy(ModelBestRandomForest(X_train, Y_train, pp),X_test,Y_test)
    while True:
        cv2.imshow('hog', hog_image)
        c = cv2.waitKey(5)
        if c==27:
            break
    cv2.destroyAllWindows()

I will try again dropbox to face haar-cascade and handdetection.pkl dataset. Include these all files to same folder.

Haar: https://www.dropbox.com/s/zdc096drhbr1sx3/haarcascade_frontalface_default.xml?dl=0 Dataset: https://www.dropbox.com/s/pieywxg8rl8rsw4/handdetection.pkl?dl=0

Patrick Liu Patrick Liu · Accepted Answer · 2017-08-02T15:46:42

After training the models I got very good results for the models (predicts around 0.95-0.97 with best models) and confusion matrixes also looked good so I think that the models have learned correctly.

The Problem: The classifier classifies but most of the time wrongly.

This could be an indication of overfitting if no validation set was held out and used to tune the parameters of whatever model you ended up using to predict. If this is the case, something like grid search cross validation may help.

You will be able to tell when the model is overfit when the performance on your validation set begins to suffer while the performance on the training set is still improving.

Classification project: Hand gesture detector fails to detect right class

Classification Project

2 Answers