1
votes

I am applying OCR against subtitle in TV footage. (I am using Tesseact 3.x w/ C++) I am trying to split text and background part as a preprocessing of OCR.

Here's the original image:

enter image description here

And, preprocessed image:

enter image description here

The OCR result is: Sicemn clone

As the above preprocessed image shown, there're some "fog" remained around the letter which prevents OCR module to do their job properly.

Is there any way to recognize those "fog" programatically to remove, or do some image processing to remove/reduce it from the preprocessed image?

Since preprocessed logic is heavily optimized to handle different images, I rather want to find a way to "clean" the preprocessed image, than modifying preprocessed logic (since optimizing to this pics can affecting to other pics)

Any suggestion is very welcome.


Update

Apparently, sixela's answer is great, and will work with most of the case. The case it does not work is background also include similar color of text

Example of not working case:

enter image description here

Example of result:

enter image description here

Seemingly, Gaussian filter seems to cause a problem in this types of footage. This implies, different footage may requires different approach.

1
The caption likely remains the same for several consecutive frames while the background changes. You might try partitioning between pixels that remain approximately the same from those that change over n frames. It won't be perfect, but it might help in many circumstances.Adrian McCarthy
Hi Adrian, Thanks for your advise. Yes, I already tried that method using absdiff and MOG filter from OpenCV. It didn't work well since in usual footage, objects in the frame will not move so quickly. Let's say if subtitle overlapped on people, they will not move so quickly, but subtitle will be only appearing for 1-2 seconds. If it is car chase scene, it will work though..Aki24x

1 Answers

3
votes

I managed to have a clearer (not perfect) image by using morphological operations and thresholding.

Here is how:

  1. I started by converting the original image in greyscale
  2. Applied a gaussian Blur (9x9 kernel) to denoise the greyscale image
  3. Top Hat Morphological operation (3x3 kernel)to get the white text
  4. Otsu thresholding method
  5. dilation
  6. Inverted binary threshold to get the white text in black

I finally obtained the following image

enter image description here

Which gives, as OCR results, this text: "Since vou don'k"

PS: This result can of course be improved by tweaking the parameters (kernel size for example) but i hope it can guide you. I used OpenCv in Python to quickly try out those methods.

import cv2

image = cv2.imread('./inputImg.png', 0)
imgBlur = cv2.GaussianBlur(image, (9, 9), 0)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
imgTH = cv2.morphologyEx(imgBlur, cv2.MORPH_TOPHAT, kernel)
_, imgBin = cv2.threshold(imgTH, 0, 250, cv2.THRESH_OTSU)

imgdil = cv2.dilate(imgBin, kernel)
_, imgBin_Inv = cv2.threshold(imgdil, 0, 250, cv2.THRESH_BINARY_INV)

cv2.imshow('original', image)
cv2.imshow('bin', imgBin)
cv2.imshow('dil', imgdil)
cv2.imshow('inv', imgBin_Inv)

cv2.imwrite('./output.png', imgBin_Inv)
cv2.waitKey(0)

After this i tried the output image on Tesseract with this command:

tesseract output.png stdout