Performing OCR of Seven Segment Display images

Question

I'm working on performing OCR of energy meter displays: example 1 example 2 example 3

I tried to use tesseract-ocr with the letsgodigital trained data. But the performance is very poor.

I'm fairly new to the topic and this is what I've done:

import numpy as np
import cv2
import imutils
from skimage import exposure
from pytesseract import image_to_string
import PIL


def process_image(orig_image_arr):

  gry_disp_arr = cv2.cvtColor(orig_image_arr, cv2.COLOR_BGR2GRAY)
  gry_disp_arr = exposure.rescale_intensity(gry_disp_arr, out_range= (0,255))

  #thresholding
  ret, thresh = cv2.threshold(gry_disp_arr,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
  
  return thresh

def ocr_image(orig_image_arr):
  otsu_thresh_image = process_image(orig_image_arr)
  cv2_imshow(otsu_thresh_image)
  return image_to_string(otsu_thresh_image, lang="letsgodigital", config="--psm 8 -c tessedit_char_whitelist=.0123456789")

img1 = cv2.imread('test2.jpg')
cnv = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
text = ocr_image(cnv)

This gives very poor results with the example images. I have a couple of questions:
How can I identify the four corners of the display? (Edge detection doesn’t seem to work very well)
Is there any futher preprocessing that I can do to improve the performance?

Thanks for any help.

mimocha mimocha · Accepted Answer · 2021-04-19T06:01:43

Notice how your power meters either use blue or green LEDs to light up the display; I suggest you use this color display to your advantage. What I'd do is select only one RGB channel based on the LED color. Then I can threshold it based on some algorithm or assumption. After that, you can do the downstream steps of cropping / resizing / transformation / OCR etc.

For example, using your example image 1, look at its histogram here. Notice how there is a small peak of green to the right of the 150 mark.

I take advantage of this, and set anything below 150 to zero. My assumption being that the green peak is the bright green LED in the image.

img = cv2.imread('example_1.jpg', 1)

# Get only green channel
img_g = img[:,:,1]
# Set threshold for green value, anything less than 150 becomes zero
img_g[img_g < 150] = 0

This is what I get. This should be much easier for downstream OCR now.

# You should also set anything >= 150 to max value as well, but I didn't in this example
img_g[img_g >= 150] = 255

The above steps should replace this step

_ret, thresh = cv2.threshold(img_g, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

Here's the output of this step.

Performing OCR of Seven Segment Display images

1 Answers