I have following Image which I want to feed to tesseract to detect the text:
Input Image:
I am processing this image using OTSU transformation: the code is as follows:
import cv2
import numpy as np
from matplotlib import pyplot as plt
import glob
for img in glob.glob("/home/image.jpg"):
cv_img=cv2.imread(img,0)
#img = cv2.imread('1.jpg',0)
cv_img = cv2.medianBlur(cv_img,5)
ret,th1 = cv2.threshold(cv_img,127,255,cv2.THRESH_BINARY)
th2 = cv2.adaptiveThreshold(cv_img,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
cv2.THRESH_BINARY,11,2)
th3 = cv2.adaptiveThreshold(cv_img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv2.THRESH_BINARY,11,2)
cv2.imwrite('OTSU.jpg', th3)
The output Image that I am getting after this transform is:
Here are my conditions: The major hurdle that I am facing is the white lines which appear randomly on the image. The actual image may have more numbers than 1993 but the number of white lines interfering image may increase, I want to clean this image files so that they are ready for Tesseract OCR.
I have tried Canny edge detection on the orignal Image which seems to give the outlines
How do I clean the input image to get rid of the white lines overlapping on the text? My aim is to run it through Tesseract OCR.