1
votes

I'm fine-tuning the GoogleNet network with Caffe to my own dataset. If I use IMAGE_DATA layers as input learning takes place. However, I need to switch to an HDF5 layer for further extensions that I require. When I use HDF5 layers no learning takes place.

I am using the exact same input images, and the labels match also. I have also checked to ensure that the data in .h5 files can be loaded correctly. It does, and Caffe is also able to find the number of examples I feed it as well as the correct number of classes (2).

This leads me to think that the issue lies in the transformations I am performing manually (since HDF5 layers do not perform any built-in transformations). The code for these is below. I do the following:

  • Convert image from RGB to BGR
  • Resize it to 256x256 so I can subtract the mean file from ImageNet (included in the Caffe library)
  • Since the original GoogleNet prototxt does not divide by 255, I also do not (see here)
  • I resize the image down to 224x224, which is the crop size required by GoogleNet
  • I transpose the image as needed to satisfy CxHxW, as required by Caffe
  • At the moment I am not performing data augmentation, which could be turned on if I let oversample=True.

Can anyone see anything wrong with this approach? Is data augmentation so critical that no learning would take place without it?

The HDF5 conversion code

IMG_RESHAPE = 224
IMG_UNCROPPED = 256

def resize_convert(img_names, path=None, oversample=False):
    '''
    Load images, set to BGR mode and transpose to CxHxW
    and subtract the Imagenet mean. If oversample is True, 
    perform data augmentation.

    Parameters:
    ---------
    img_names (list): list of image names to be processed.
    path (string): path to images.
    oversample (bool): if True then data augmentation is performed
        on each image, and 10 crops of size 224x224 are produced 
        from each image. If False, then a single 224x224 is produced.
    '''

    path = path if path is not None else ''
    if oversample == False:
        all_imgs = np.empty((len(img_names), 3, IMG_RESHAPE, IMG_RESHAPE), dtype='float32')
    else:
        all_imgs = np.empty((len(img_names), 3, IMG_UNCROPPED, IMG_UNCROPPED), dtype='float32')

    #load the imagenet mean
    mean_val = np.load('/path/to/imagenet/ilsvrc_2012_mean.npy')

    for i, img_name in enumerate(img_names):
        img = ndimage.imread(path+img_name, mode='RGB') # Read as HxWxC

        #subtract the mean of Imagenet
        #First, resize to 256 so we can subtract the mean of dims 256x256 
        img = img[...,::-1] #Convert RGB TO BGR
        img = caffe.io.resize_image(img, (IMG_UNCROPPED, IMG_UNCROPPED), interp_order=1)
        img = np.transpose(img, (2, 0, 1))  #HxWxC => CxHxW 
        #Since mean is given in Caffe channel order: 3xWxH
        #Assume it also is given in BGR order
        img = img - mean_val

        #set to 0-1 range => I don't think googleNet requires this
        #I tried both and it didn't make a difference
        #img = img/255

        #resize images down since GoogleNet accepts 224x224 crops
        if oversample == False:
            img = np.transpose(img, (1,2,0))  # CxHxW => HxWxC 
            img = caffe.io.resize_image(img, (IMG_RESHAPE, IMG_RESHAPE), interp_order=1)
            img = np.transpose(img, (2,0,1)) #convert to CxHxW for Caffe 
        all_imgs[i, :, :, :] = img

    #oversampling requires HxWxC order
    if oversample:
        all_imgs = np.transpose(all_imgs, (0, 3, 1, 2))
        all_imgs = caffe.io.oversample(all_imgs, (IMG_RESHAPE, IMG_RESHAPE))
        all_imgs = np.transpose(all_imgs, (0,2,3,1)) #convert to CxHxW for Caffe 

    return all_imgs

Relevant differences between IMAGE_DATA and HDF5 prototxt files

name: "GoogleNet"
layers {
  name: "data"
  type: HDF5_DATA
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "/path/to/train_list.txt"
    batch_size: 32
  }
  include: { phase: TRAIN }
}
layers {
  name: "data"
  type: HDF5_DATA
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "/path/to/valid_list.txt"
    batch_size:10
  }
  include: { phase: TEST }
}

Update

When I say no learning is taking place I mean that my training loss is not going down consistently when using HDF5 data compared to the IMG_Data. In the images below, the first plot is plot the change in the training loss for the IMG_DATA network, and the other is the HDF5 data network.

One possibility that I am considering is that the network is overfitting to each of the .h5 that I am feeding it. At the moment I am using data augmentation, but all of the augmented examples are stored into a single .h5 file, along with other examples. However, because all of the augmented versions of a single input image are all contained within the same .h5 file, I think this could cause the network to overfit to that specific .h5 file. However, I am not sure whether this is what the second plot suggests.

Training loss with IMG_DATA input Training loss with HDF5 input

1
1. why don't you read the images using caffe.io.load_image?Shai
2. why are you resizing with order=1? try higher order.Shai
3. changing image size from 256 to 224 should not be done using resize, but rather by croppingShai
Thanks Shai. I tried those, no luck. I must have a bug elsewhere.angela
@Shai, I added more info on what I mean by no learningangela

1 Answers

0
votes

I faced the same problem and found out that for some reason doing the transformation manually as you are doing in your code causes the images to be all black (all zeros). try to debug your code and see if that is happening. the solution is to use the same methodology explained in the Caffe tutorial here http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb the part where you see

# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimension
transformer.set_mean('data', mu)            # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0))  # swap channels from RGB to BGR

then few lines down

image = caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')
transformed_image = transformer.preprocess('data', image)