I am currently working on a simple task where I have to read characters from simple images. Used tesseract in python to write following code which works on all simple and complex images having English text but fails on a particular image, can anyone tell me why the following python code can not read characters from the following image , as you can see with 6 black sections having 6 characters in each box, character colour is white.
from PIL import Image
import pytesseract
import argparse
import cv2
import os
ap = argparse.ArgumentParser()
ap.add_argument("-i","--image",required=True,help="path to input image to be OCR'd")
ap.add_argument("-p","-- preprocess",type=str,default="thresh",help="type of preprocessing to be done")
args=vars(ap.parse_args())
image = cv2.imread(args["image"])
gray2 = cv2.BackgroundSubtractor()
gray2 = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
gray2 = cv2.threshold(gray2, 100, 255, cv2.THRESH_BINARY_INV)[1]
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 3))
sqKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
if args["preprocess"] == "thresh":
gray2 = cv2.threshold(gray2, 2, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
elif args["preprocess"] == "blur":
gray2 = cv2.medianBlur(gray2,3)
filename = "{}.png".format(os.getpid())
cv2.imwrite(filename,gray2)
text = pytesseract.image_to_string(Image.open(filename))
print(text)
If this image is used, it gives perfect result
0123456789 ABCDEFGHIJKL MNOPQRSTUV WXYZ
When I am trying to use the following image ,
Processed Image which does not give result the code completely ignores all characters in the image. The only difference is the first image does not have boxes like the second image where each black box contains 6 characters
Can someone please help and tell me what am I missing why this code fails for the second image, Thanks.
cv2.imwrite(filename,gray2)
) – Dmitrii Z.