1
votes

I would like to train a conv neural network to detect the presence of hands in images.

The difficulty is that: 1/ the images will contain other objects than the hands, just like a picture of a group of people where the hands are just a small part of the image 2/ hands can have many orientations / shapes etc (whether they are open or not , depending on the angle etc..)

I was thinking of training the convnet on a big set of cropped hand images (+ random images without hands) and then apply the classifier on all the subsquares of my images. Is this a good approach?

Are there other examples of complex 2-class convnets / RNNs I could use for inspiration?

Thank you!

2

2 Answers

1
votes

This seems more a matter of finding good labeled training data than of choosing a network. A neural network can learn the difference between "pictures of hands" and "pictures which incidentally include hands", but it needs some labeled examples to figure out which category an image belongs to.

You might want to take a look at this: http://www.socher.org/index.php/Main/ParsingNaturalScenesAndNaturalLanguageWithRecursiveNeuralNetworks

1
votes

I was thinking of training the convnet on a big set of cropped hand images (+ random images without hands) and then apply the classifier on all the subsquares of my images. Is this a good approach?

Yes, I believe this would be a good approach. However, note that when you say random, you should perhaps sample it from images where "hands are most likely to appear". It really depends on your use case, and you have to tune the data set to fit what you're doing.

How you should build your data set, would be something like this:

  1. Crop images of hands from a big image.
  2. Sample X number of images from that same image, but not anywhere near the hand/hands.

If however, you should choose to do something like this:

  1. Crop images of hands from a big image.
  2. Download 1 million images (an exaggeration) that definitely don't have hands. For example, deserts, oceans, skies, caves, mountains, basically lots of scenery. And then use this as your "random images without hands", you might get bad results.

The reason for this, is because there is an underlying distribution already. I assume that most of your images could be pictures of groups of friends, having a party at a house, or perhaps the background images would be buildings. Hence, introducing scenery images, could corrupt this distribution, whilst holding the above assumption.

Therefore, be really careful when using "random images"!

on all the subsquares of my images

As to this part of your question, you are essentially running a sliding window on the entire image. Yes, practically, it would work. But if you're looking for performance, this may not be a good idea. You might want to run some segmentation algorithms, to narrow down the search space.

Are there other examples of complex 2-class convnets / RNNs I could use for inspiration?

I'm not sure what you mean by complex 2-class convnets. I'm not familiar with RNNs, so let me focus on convnets. You can basically define the convolutional net yourself. For example, the convolutional layers size, how many layers, what's your max pooling method, how big is your fully connected layer going to be, etc. The last layer, is basically a softmax layer, where the net decides what class it's going to be. If you have 2 classes, your last layer has 2 nodes. If you have 3, then 3. And so on. So it can range from 2, to perhaps even 1000. I've not heard of convnets that have more than 1000 classes, but I could be ill-informed. I hope this helps!