OpenCV: Isolating licence plate characters for OCR

Question

I am attempting to automatically read license plates. I have trained an OpenCV Haar Cascade classifier to isolate license plates in a source image to reasonable success. Here is an example (note the black bounding rectangle). Following this, I attempt to clean up the license plate for either:

Isolating individual characters for classification via a SVM.
Providing the cleaned license plate to Tesseract OCR with a whitelist of valid characters.

To clean up the plate, I perform the following transforms:

# Assuming 'plate' is a sub-image featuring the isolated license plate
height, width = plate.shape
# Enlarge the license plate
cleaned = cv2.resize(plate, (width*3,height*3))
# Perform an adaptive threshold
cleaned = cv2.adaptiveThreshold(cleaned ,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,11,7)
# Remove any residual noise with an elliptical transform
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(3,3))
cleaned = cv2.morphologyEx(cleaned, cv2.MORPH_CLOSE, kernel)

My goal here is to isolate the characters to black and the background to white while removing any noise.

Using this method, I find I generally get one of three results:

Image too noisy.

Too much removed (characters disjointed).

Reasonable (all characters isolated and consistent).

I've included the original images and cropped plates in this album.

I realise that due to the inconsistent nature of license plates, I will likely need a more dynamic clean up method, but I'm not sure where to get started. I've tried playing with the parameters of the threshold and morphology functions, but this generally just leads to over-tuning towards one image.

How can I improve my clean up function?

What you call "too noisy" is actually "too many extra characters or graphics", which is a common feature on American plates. — Yves Daoust

Yves Daoust Yves Daoust · Accepted Answer · 2017-07-28T06:58:41

What you are trying to do is pretty challenging and the samples that you show are still among easy ones.

In the first place, it is important to obtain a good delimitation of the main characters area.

For the vertical delimitation, try and find horizontal white lines that act as separators. For harder cases such as the "too noisy", you can compute statistics along horizontal lines, such as the distribution of white and black runs - count, average length, deviation of the lenghts -, and find discriminating parameters between lines across true characters and extra features (by the way, this will implicitly detect the white lines).

Doing so, you will obtain rectangles formed by rows of the same type, which may accidentally be fragmented. Try to merge the rectangles that seem to belong to true characters. The next step of processing will be limited to this rectangle.

For the vertical delimitation, things aren't so easy because you wil see cases where the characters are split so that vertical lines can traverse them, and cases where distinct characters are joined by dirt or other clutter. (In some terrible cases, characters can be touching on an extended area.)

By a technique similar to the above, find candidate vertical lines. Now you have little other choice than forming several hypothesis, and enumerate possible combinations of these separators, constrained by the fact that the characters have a minimum spacing (between their axes).

After you have formed those hypothesis, you can decide the best combination by performing character recognition and computing an overall score. (At this stage, I don't think it is possible to perform the segmentation without knowing the possible shapes of the characters, and this is why recognition enters into play.)

OpenCV: Isolating licence plate characters for OCR

1 Answers