0
votes

I'm trying to use optical character recognition (OCR) to read text printed on digital video (DV) tapes. I'm using cropped still frames from the video for the OCR process. The text is white, but there are color artifacts (maybe composite color artifacts) so that the white text has color bleeding onto it (see example below). The colors look to be in magenta-cyan-yellow colorspace, maybe?

OCR results would likely be improved if I could remove/filter those colors to leave only white on the text. Then I can create a binary black/white image. I can do this now, but I suspect results will improve if I can remove colors from the white text before OCR, and this will hopefully help separate the white text from the background image.

Are there any ways, using Imagemagick preferably, to filter out those colors from the white text? I'm not sure of the best way to approach this since there are multiple colors bleeding, and the background changes in each frame. Currently using Imagemagick version 6.9.2-3 Q16 x64 on Windows 7.

Sample full-frame image: Full still frame

Sample of cropped region with text (note color-bleed and white text blending into background):
Cropped region

1
If you are looking for known letters/digits in a known font at a specific position in the images, you may be better off with a template matching type of approach. Have a look at my answer here stackoverflow.com/a/40085218/2836621 - Mark Setchell

1 Answers

2
votes

I would suggest leveraging ImageMagick's FX & Morphology Dilate to preprocess the image. But to be honest, it'll take a bit of trial & error to find the solution that would work for you. I would also recommend that whatever solution you develop allows graceful error handling (i.e. If attempted OCR process unsuccessful, emit warning, and progress video to next I-frame & repeat.)

Fx Preprocessing

The -fx operator will allow you to create user-defined mathematical expression. Some quick google search about chrome-keys, and other tolerance methods might be helpful. But for many OCR techniques, it's usually common to reduce the colors to a "uniformed" gray scale.

convert aaA7b.png -fx 'intensity' intensity.png

intensity.png

Morphology Preprocessing

Morphology allows common & custom kernels to alter surrounding pixels. As video scanlines + other artifacts are distorting the text, I would recommend exploring Dilate, but there are many other techniques listed in the Usage documents.

Diamond

convert aaA7b.png -fx 'intensity' \
        -morphology Dilate Diamond:1 diamond.png

diamond.png

Square

convert aaA7b.png -fx 'intensity' \
        -morphology Dilate Square:1 square.png

square.png

Plus

convert aaA7b.png -fx 'intensity' \
        -morphology Dilate Plus:1 plus.png

plus.png

Custom

And if you need something more exact, create your own kernel by supplying the following format size: row1 row2 ... rowN. In this example, I'm creating a 3x3 kernel with a single vertical line to offset the video scanlines.

 convert aaA7b.png -fx 'intensity' \
         -morphology Dilate \
         '3x3: nan,1,nan nan,1,nan nan,1,nan'  user_defined.png 

user_defined.png

But YMMV. Also take a look at Fred's TextCleaner script. The -deskew & -sharpen operators will help reduce the noise.

Sample of cropped region with text (note color-bleed and white text blending into background): _

I think there's a saying "You can't make steak from a hamburger." or something like that. At some point the background will washout the text in the foreground, and it's time better spent to create a solution that acknowledges this.