Tensorflow ConcatOp Error with Object Detection API

Question

I'm following tensorflow object detection api instructions and trying to train existing object-detection model("faster_rcnn_resnet101_coco") with my own dataset having 50 classes.

So according to my own dataset, I created

TFRecord (FOR training,evaluation and testing separately)
labelmap.pbtxt

Next, I edited model.config only for model-faster_rcnn-num_classes(90 -> 50(the number of classes of my own dataset), train_config-batch_size(1 -> 10), train_config-num_steps(200000 -> 100), train_input_reader-tf_record_input_reader-input_path(to the path where TFRecord resides) and train_input_reader-label_map_path(to the path where labelmap.pbtxt resides).

Finally, I run the command

python train.py \
--logtostderr \
--pipeline_config_path="PATH WHERE CONFIG FILE RESIDES" \
--train_dir="PATH WHERE MODEL DIRECTORY RESIDES"

And I met the error below:

InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,890,600,3] vs. shape[1] = [1,766,600,3] [[Node: concat_1 = ConcatV2[N=10, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](Preprocessor/sub, Preprocessor_1/sub, Preprocessor_2/sub, Preprocessor_3/sub, Preprocessor_4/sub, Preprocessor_5/sub, Preprocessor_6/sub, Preprocessor_7/sub, Preprocessor_8/sub, Preprocessor_9/sub, concat_1/axis)]]

It seems like the dimension of input images so it may be caused by not resizing the raw image data.

But As I know, model automatically resizes the input image to train (isn't it?)

Then I'm stuck with this issue.

If there is solution, I'll appreciate it for your answer. Thanks.

UPDATE

When I updated my batch_size field from 10 to one(original one), it seems to train without any problem... but I don't understand why...

see the config file sin that repo, the batch size is 1 according to the faster rcnn paper. Bigger batch size will consume too much memory. — Jie.Zhou
@Jie.Zhou Here is my "model.config" file : pastebin.com/4An9HsPK as I stated above, a few things have been changed — LKM
I think the code is probably written for a single one image as input, so if you change the batch size to int bigger than one, the error will be raised for some internal mistake — Jie.Zhou
do you mean that "the code" from tensorflow, not from myself is written for a single one image because the paper of Faster-R-CNN processses the batch as a single one image? — LKM

Ciprian Tomoiagă Ciprian Tomoiagă · Accepted Answer · 2017-10-27T08:25:49

TaeWoo is right, you have to set batch_size to 1 in order to train Faster RCNN.

This is because FRCNN uses a keep_aspect_ratio_resizer, which in turn means that if you have images of different sizes, they will also be different sizes after the preprocessing. This practically makes batching impossible, since a batch tensor has a shape [num_batch, height, width, channels]. You can see this is a problem when (height, width) differ from one example to the next.

This is in contrast to the SSD model, which uses a "normal" resizer, i.e. regardless of the input image, all preprocessed examples will end-up having the same size, which allows them to be batched together.

Now, if you have images of different sizes, you practically have two ways of using batching:

use Faster RCNN and pad your images before, either one time before training, or continuously as a preprocessing step. I'd suggest the former, since this type of preprocessing seems to slow down learning a lot
use SSD, but be sure that your objects are not affected too much by distortion. This shouldn't be a very big problem, it works as a way of data augmentation.

Tensorflow ConcatOp Error with Object Detection API

3 Answers