
I have following Image which I want to feed to tesseract to detect the text:

Input Image:

enter image description here

I am processing this image using OTSU transformation: the code is as follows:

import cv2
import numpy as np
from matplotlib import pyplot as plt
import glob

for img in glob.glob("/home/image.jpg"):

    #img = cv2.imread('1.jpg',0)
    cv_img = cv2.medianBlur(cv_img,5)

    ret,th1 = cv2.threshold(cv_img,127,255,cv2.THRESH_BINARY)
    th2 = cv2.adaptiveThreshold(cv_img,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
    th3 = cv2.adaptiveThreshold(cv_img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\

    cv2.imwrite('OTSU.jpg', th3)

The output Image that I am getting after this transform is:

enter image description here

Here are my conditions: The major hurdle that I am facing is the white lines which appear randomly on the image. The actual image may have more numbers than 1993 but the number of white lines interfering image may increase, I want to clean this image files so that they are ready for Tesseract OCR.

I have tried Canny edge detection on the orignal Image which seems to give the outlines

Canny edge detector Output: enter image description here

How do I clean the input image to get rid of the white lines overlapping on the text? My aim is to run it through Tesseract OCR.


3 Answers


I think you should look into to Morphological Transformations in opencv. The documentation can be found here.

Made a small piece of code with a starting point from your script where I have used erosion and dilation:

import cv2
import numpy as np

cv_img = cv2.imread('1993.jpg', 0)

cv_img = cv2.medianBlur(cv_img, 5)

ret, th1 = cv2.threshold(cv_img,127,255,cv2.THRESH_BINARY)
th2 = cv2.adaptiveThreshold(cv_img,255,cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY,11,2)
th3 = cv2.adaptiveThreshold(cv_img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,11,2)

kernel_erosion = np.ones((3,3), np.uint8)
kernel_dilation = np.ones((1,1), np.uint8)
erosion = cv2.erode(th2, kernel_erosion, iterations=1)
dilation = cv2.dilate(erosion, kernel_dilation, iterations=1)

cv2.imwrite('morph.jpg', dilation)

You can play around with different kernels or different transformations. This is the output I got:

Erosion and dilation


Try using CLAHE before you threshold the image. This is what I tried:

import cv2
import numpy as np

image = cv2.imread("numbers.jpg")

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

clahe = cv2.createCLAHE(clipLimit=10, tileGridSize=(5, 5))
gray = clahe.apply(gray)
ret, thresh = cv2.threshold(v, 140, 150, cv2.THRESH_BINARY_INV)
morph = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, np.ones((5,5), np.uint8), 
morph = cv2.morphologyEx(morph, cv2.MORPH_CLOSE, np.ones((3,1), np.uint8), 



You can replace the threshold type with cv2.THRESH_OTSU and it should work without any issues. Try playing around with the parameters and I'm sure you can make it to work with all your images. Cheers!


And one more result with cv2.ximgproc.niBlackThreshold:

enter image description here