2
votes

I am quite new to Caffe and Deep Learning so please bear with my inexperience and naive questions. Now to the point, I want to train GoogleNet using the FER2013 dataset(it consists of faces and it's purpose is to recognise one of the 7 categories the faces fall into). However the data isn't in an image format rather than an array of 48x48=2304 values with each value being between 0 and 255. So in order to create the lmdb files needed to feed the Caffe I wrote the following Python script to transform the arrays into real images.

import numpy as np
from PIL import Image
import csv
import itertools

with open('fer2013.csv', 'rb') as f:
    mycsv = csv.reader(f)
    i=0
    for row in itertools.islice(mycsv, 340):
        data = row[1]
        data = data.split()
        data = map(int, data)
        data = np.array(data)
        im = Image.fromarray(data.reshape((48,48)).astype('uint8')*255)
        directory='imagestotest/'
        path_to_save = directory+"image"+str(i)+".jpg"
        path = "image"+str(i)+".jpg"
        im.save(path_to_save)
        i=i+1
        with open("testset.txt", "a") as myfile:
            myfile.write(path+" "+row[0]+"\n")

I then prepare my lmdb files with the following command

GLOG_logtostderr=1 ./deep-learning/caffe/build/tools/convert_imageset --resize_height=256 --resize_width=256 --shuffle /home/panos/Desktop/images/ /home/panos/Desktop/trainingset.txt /home/panos/Desktop/train_lmdb

Finally, I compute the image_mean, I change the train_val.prototxt and set loss1,loss2,loss3 layers to have num_output=7 (as I have 7 classes, 0-6).

I run my model (training size: 5000 and test size: 340) and the accuracy is rather disappointing, being close to 23%(top-1), 88.8%(top-5).

Could this be a hyper parameter configuration problem, or is it that my input files are not created correctly? (as I fear my Python cooking skills)

If it helps, my main hyper parameters are: test_iter: 7,test_interval: 40,base_lr: 0.001,momentum: 0.9,weight_decay: 0.0002.

Thanks in advance !

1
Your are resizing 48 X 48 into 256 X 256. This might be a problem. Did you use Googlenet pretrained model for weight initialization?lnman
@Inman How can I do that ? And what does it offer ? Thanks a million for the quick reply!Damager

1 Answers

1
votes

To use a pretrained model you will need to download googlenet model first from here . Now you can use this command:

caffe train —solver solver.prototxt —weights bvlc_googlenet.caffemodel

One of the main problem with training is weight initialization. Without proper initialization the model might not converge showing poor performance. You can't initialize all weights with the same value. Some other recommendations are weights should be sparse, orthogonal, normalized etc. So it is usually recommended to use weights from a pretrained model. It can be called transfer learning. For details of weight initialization you can see this by karpathy. You can also see What are good initial weights in a neural network? If you want deeper understanding see the following papers.

[1] Bengio, Yoshua. "Practical recommendations for gradient-based training of deep architectures." Neural Networks: Tricks of the Trade. Springer Berlin Heidelberg, 2012. 437-478.

[2] LeCun, Y., Bottou, L., Orr, G. B., and Muller, K. (1998a). Efficient backprop. In Neural Networks, Tricks of the Trade.

[3] Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." International conference on artificial intelligence and statistics. 2010.