I'm trying to replicate a large Caffe network into Keras (based on tensorflow backend). But I'm having a large trouble doing it even at a single convolutional layer.
Simple Convolution in General:
Let's say we had a 4D input with shape (1, 500, 500, 3), and we had to perform a single convolution on this input with 96 filters with kernel size of 11 and 4x4 strides.
Let's set our weight and input variables:
w = np.random.rand(11, 11, 3, 96) # weights 1
b = np.random.rand(96) # weights 2 (bias)
x = np.random.rand(500, 500, 3)
Simple Convolution in Keras:
This is how it could be defined in Keras:
from keras.layers import Input
from keras.layers import Conv2D
import numpy as np
inp = Input(shape=(500, 500, 3))
conv1 = Conv2D(filters=96, kernel_size=11, strides=(4, 4), activation=keras.activations.relu, padding='valid')(inp)
model = keras.Model(inputs=[inp], outputs=conv1)
model.layers[1].set_weights([w, b]) # set weights for convolutional layer
predicted = model.predict([x.reshape(1, 500, 500, 3)])
print(predicted.reshape(1, 96, 123, 123)) # reshape keras output in the form of Caffe
Simple Convolution in Caffe:
simple.prototxt:
name: "simple"
input: "inp"
input_shape {
dim: 1
dim: 3
dim: 500
dim: 500
}
layer {
name: "conv1"
type: "Convolution"
bottom: "inp"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
pad: 0
stride: 4
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
Caffe in Python:
import caffe
net = caffe.Net('simple.prototxt', caffe.TEST)
net.params['conv1'][0].data[...] = w.reshape(96, 3, 11, 11) # set weights 1
net.params['conv1'][1].data[...] = b # set weights 2 (bias)
net.blobs['inp'].reshape(1, 3, 500, 500) # reshape input layer to fit our input array x
print(net.forward(inp=x.reshape(1, 3, 500, 500)).get('conv1'))
Problem:
If we executed both of the snippets of code, we would notice that outputs are different from each other. I understand that there are few differences such as symmetric padding of Caffe, but I didn't even use padding here. Yet the output of Caffe is different from output of Keras...
Why is this so? I know that Theano backend doesn't utilize correlation like Caffe does and hence it requires kernel to be rotated by 180 degrees, but is it the same for tensorflow? from what I know, both Tensorflow and Caffe use cross-correlation instead of Convolution.
How could I make two identical models in Keras and Caffe that use convolution?
Any help would be appreciated, thanks!