I am attempting to automatically read license plates.
I have trained an OpenCV Haar Cascade classifier to isolate license plates in a source image to reasonable success. Here is an example (note the black bounding rectangle).
Following this, I attempt to clean up the license plate for either:
- Isolating individual characters for classification via a SVM.
- Providing the cleaned license plate to Tesseract OCR with a whitelist of valid characters.
To clean up the plate, I perform the following transforms:
# Assuming 'plate' is a sub-image featuring the isolated license plate
height, width = plate.shape
# Enlarge the license plate
cleaned = cv2.resize(plate, (width*3,height*3))
# Perform an adaptive threshold
cleaned = cv2.adaptiveThreshold(cleaned ,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,11,7)
# Remove any residual noise with an elliptical transform
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(3,3))
cleaned = cv2.morphologyEx(cleaned, cv2.MORPH_CLOSE, kernel)
My goal here is to isolate the characters to black and the background to white while removing any noise.
Using this method, I find I generally get one of three results:
Image too noisy.
Too much removed (characters disjointed).
Reasonable (all characters isolated and consistent).
I've included the original images and cropped plates in this album.
I realise that due to the inconsistent nature of license plates, I will likely need a more dynamic clean up method, but I'm not sure where to get started. I've tried playing with the parameters of the threshold and morphology functions, but this generally just leads to over-tuning towards one image.
How can I improve my clean up function?