2
votes

I am trying to train a cnn model for ocr using keras. I preprocessed the images by converting to grayscale, removing noise and then converting it to binary, as binary images work better in ocr. But the problem I am getting is that binary image has 2 dimensions and no channel dimension and conv2d in keras(well any conv layer in general) require 3 dimensions. So what should I do to add a dimension but keep image binary? I am using cv2 for image processing so please tell solutions using that preferably. Also tell me whether I am right that using binary image dataset is better for ocr.

1
change the dnn architecture to only use one channel. Or add redudant channels, but this will make your model unnecessarily complex.Micka
@Micka but the conv2d layer of keras requires 3 input dimensions. How can I change that? As for adding redundant channel how to add that?Shantanu Shinde
according to the docs: "When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers, does not include the batch axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format="channels_last"." So I think you could use input_shape=(height,width,1) for your grayscale or binary data? Sorry, from my side it is only theoretical. And I don't know how to duplicate channels or sth. in python.Micka
@Micka I am using binary, not grayscaleShantanu Shinde
yes, but it will be used as grayscale. The important thing is, that it is only 1 channel. That's the 1 in input_shape=(height,width,1)Micka

1 Answers

0
votes

I got my solution. I used numpy function numpy.expand_dims() to add empty dimension. so it became (width,height,1). Here is what I did:-

img = np.expand_dims(img,axis=2)