What's the difference between convolution in Keras vs Caffe?

Question

I'm trying to replicate a large Caffe network into Keras (based on tensorflow backend). But I'm having a large trouble doing it even at a single convolutional layer.

Simple Convolution in General:

Let's say we had a 4D input with shape (1, 500, 500, 3), and we had to perform a single convolution on this input with 96 filters with kernel size of 11 and 4x4 strides.

Let's set our weight and input variables:

w = np.random.rand(11, 11, 3, 96)  # weights 1
b = np.random.rand(96)  # weights 2 (bias)

x = np.random.rand(500, 500, 3)

Simple Convolution in Keras:

This is how it could be defined in Keras:

from keras.layers import Input
from keras.layers import Conv2D
import numpy as np

inp = Input(shape=(500, 500, 3))
conv1 = Conv2D(filters=96, kernel_size=11, strides=(4, 4), activation=keras.activations.relu, padding='valid')(inp)                                                            


model = keras.Model(inputs=[inp], outputs=conv1)
model.layers[1].set_weights([w, b])  # set weights for convolutional layer


predicted = model.predict([x.reshape(1, 500, 500, 3)])
print(predicted.reshape(1, 96, 123, 123))  # reshape keras output in the form of Caffe

Simple Convolution in Caffe:

simple.prototxt:

name: "simple"
input: "inp"
input_shape {
  dim: 1
  dim: 3
  dim: 500
  dim: 500
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "inp"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    pad: 0
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}

Caffe in Python:

import caffe

net = caffe.Net('simple.prototxt', caffe.TEST)
net.params['conv1'][0].data[...] = w.reshape(96, 3, 11, 11)  # set weights 1
net.params['conv1'][1].data[...] = b  # set weights 2 (bias)
net.blobs['inp'].reshape(1, 3, 500, 500) # reshape input layer to fit our input array x
print(net.forward(inp=x.reshape(1, 3, 500, 500)).get('conv1'))

Problem:

If we executed both of the snippets of code, we would notice that outputs are different from each other. I understand that there are few differences such as symmetric padding of Caffe, but I didn't even use padding here. Yet the output of Caffe is different from output of Keras...

Why is this so? I know that Theano backend doesn't utilize correlation like Caffe does and hence it requires kernel to be rotated by 180 degrees, but is it the same for tensorflow? from what I know, both Tensorflow and Caffe use cross-correlation instead of Convolution.

How could I make two identical models in Keras and Caffe that use convolution?

Any help would be appreciated, thanks!

ShellRox ShellRox · Accepted Answer · 2019-02-17T18:14:25

I found the problem, but I'm not sure how to fix it yet...

The difference between these two convolutional layers is alignment of their items. This alignment problem only occurs when number of filters are equal to N such that N > 1 && N > S where S is dimension of filter. In other words, such problem only occurs when we get a multi-dimensional array from convolution which has both number of rows and number of columns greater than 1.

Evidence:

To see this, I simplified my input and output data so that we can better analyze the mechanics of both layers.

simple.prototxt:

input: "input"
input_shape {
  dim: 1
  dim: 1
  dim: 2
  dim: 2
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "input"
  top: "conv1"
  convolution_param {
    num_output: 2
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}

simple.py:

import keras
import caffe
import numpy as np
from keras.layers import Input, Conv2D
from keras.activations import relu
from keras import Model

filters = 2  # greater than 1 and ker_size
ker_size = 1 

_input = np.arange(2 * 2).reshape(2, 2)
_weights = [np.reshape([[2 for _ in range(filters)] for _ in range(ker_size*ker_size)], (ker_size, ker_size, 1, filters)), np.reshape([0 for _ in range(filters)], (filters,))]  # weights for Keras, main weight is array of 2`s while bias weight is array of 0's
_weights_caffe = [_weights[0].T, _weights[1].T]  # just transpose them for Caffe

# Keras Setup

keras_input = Input(shape=(2, 2, 1), dtype='float32')
keras_conv = Conv2D(filters=filters, kernel_size=ker_size, strides=(1, 1), activation=relu, padding='valid')(keras_input)
model = Model(inputs=[keras_input], outputs=keras_conv)
model.layers[1].set_weights([_weights[0], _weights[1]])

# Caffe Setup

net = caffe.Net("simpler.prototxt", caffe.TEST)
net.params['conv1'][0].data[...] = _weights_caffe[0]
net.params['conv1'][1].data[...] = _weights_caffe[1]
net.blobs['input'].data[...] = _input.reshape(1, 1, 2, 2)


# Predictions


print("Input:\n---")
print(_input)
print(_input.shape)
print("\n")

print("Caffe:\n---")
print(net.forward()['conv1'])
print(net.forward()['conv1'].shape)
print("\n")

print("Keras:\n---")
print(model.predict([_input.reshape(1, 2, 2, 1)]))
print(model.predict([_input.reshape(1, 2, 2, 1)]).shape)
print("\n")

Output:

Input:
---
[[0 1]
 [2 3]]
(2, 2)


Caffe:
---
[[[[0. 2.]
   [4. 6.]]

  [[0. 2.]
   [4. 6.]]]]
(1, 2, 2, 2)


Keras:
---
[[[[0. 0.]
   [2. 2.]]

  [[4. 4.]
   [6. 6.]]]]
(1, 2, 2, 2)

Analysis:

If you look at output by the Caffe model, you'll notice that our 2x2 array is first doubled (so that we have an array of 2 2x2 arrays) and then matrix multiplication is performed on each of those two arrays with our weight matrix. Something like this:

Original:

[[[[0. 2.]
   [4. 6.]]

  [[0. 2.]
   [4. 6.]]]]

Transformed:

[[[[(0 * 2) (2 * 2)]
   [(4 * 2) (6 * 2)]]

  [[(0 * 2) (2 * 2)]
   [(4 * 2) (6 * 2)]]]]

Tensorflow does something different, it seems to first align 2D vectors of output in ascending order after doing the same thing as Caffe did. This seems like a weird behavior, and I'm unable to understand why would they do such thing.

Solution:

I have answered my own question about the cause of the problem, but I'm not aware of any clean solution yet. I still don't find my answer satisfying enough hence I'm going to accept the question which has the actual solution.

The only solution I know is creation of custom layer, which is not a very neat solution to me.

What's the difference between convolution in Keras vs Caffe?

1 Answers

Evidence:

Solution: