8
votes

I'm following tensorflow object detection api instructions and trying to train existing object-detection model("faster_rcnn_resnet101_coco") with my own dataset having 50 classes.

So according to my own dataset, I created

  1. TFRecord (FOR training,evaluation and testing separately)
  2. labelmap.pbtxt

Next, I edited model.config only for model-faster_rcnn-num_classes(90 -> 50(the number of classes of my own dataset), train_config-batch_size(1 -> 10), train_config-num_steps(200000 -> 100), train_input_reader-tf_record_input_reader-input_path(to the path where TFRecord resides) and train_input_reader-label_map_path(to the path where labelmap.pbtxt resides).

Finally, I run the command

python train.py \
--logtostderr \
--pipeline_config_path="PATH WHERE CONFIG FILE RESIDES" \
--train_dir="PATH WHERE MODEL DIRECTORY RESIDES"

And I met the error below:

InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,890,600,3] vs. shape[1] = [1,766,600,3] [[Node: concat_1 = ConcatV2[N=10, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](Preprocessor/sub, Preprocessor_1/sub, Preprocessor_2/sub, Preprocessor_3/sub, Preprocessor_4/sub, Preprocessor_5/sub, Preprocessor_6/sub, Preprocessor_7/sub, Preprocessor_8/sub, Preprocessor_9/sub, concat_1/axis)]]

It seems like the dimension of input images so it may be caused by not resizing the raw image data.

But As I know, model automatically resizes the input image to train (isn't it?)

Then I'm stuck with this issue.

If there is solution, I'll appreciate it for your answer. Thanks.

UPDATE

When I updated my batch_size field from 10 to one(original one), it seems to train without any problem... but I don't understand why...

3
see the config file sin that repo, the batch size is 1 according to the faster rcnn paper. Bigger batch size will consume too much memory.Jie.Zhou
@Jie.Zhou Here is my "model.config" file : pastebin.com/4An9HsPK as I stated above, a few things have been changedLKM
I think the code is probably written for a single one image as input, so if you change the batch size to int bigger than one, the error will be raised for some internal mistakeJie.Zhou
do you mean that "the code" from tensorflow, not from myself is written for a single one image because the paper of Faster-R-CNN processses the batch as a single one image?LKM
that's exactly what I meanJie.Zhou

3 Answers

14
votes

TaeWoo is right, you have to set batch_size to 1 in order to train Faster RCNN.

This is because FRCNN uses a keep_aspect_ratio_resizer, which in turn means that if you have images of different sizes, they will also be different sizes after the preprocessing. This practically makes batching impossible, since a batch tensor has a shape [num_batch, height, width, channels]. You can see this is a problem when (height, width) differ from one example to the next.

This is in contrast to the SSD model, which uses a "normal" resizer, i.e. regardless of the input image, all preprocessed examples will end-up having the same size, which allows them to be batched together.

Now, if you have images of different sizes, you practically have two ways of using batching:

  • use Faster RCNN and pad your images before, either one time before training, or continuously as a preprocessing step. I'd suggest the former, since this type of preprocessing seems to slow down learning a lot
  • use SSD, but be sure that your objects are not affected too much by distortion. This shouldn't be a very big problem, it works as a way of data augmentation.
4
votes

I had the same problem. Setting batch_size=1 does indeed seem to solve this problem but i am not sure if this will have any effect on accuracy of the model. Would love to get TF team's answer to this.

0
votes

I had a similar problem that I want to share, maybe it would others with similar situations. I've changed SSD OD net to find bboxes with a fifth variable which is an angle. The problem was that we inserted an empty list to the angle variable in the bounding box and then I had a problem in tf.concat operation :

Dimensions of inputs should match: shape[0] = [1,43] vs. shape[4] = [1,0]

(shape[0] changed if I rerun the session but shape[4] stayed the same [1,0])

I fixed the problem by fixing my tf record to have a list of angles in the same lenth of other bbox variables (xmin, xmax, ymin, ymax).

Hope it helps someone , it took me a whole day to find out the problem.

Regards, Alon