My task is to identify character patches within the document image. Consider the image below:
Based from the paper, to extract character patches, the MSER based method will be adopted to detect character candidates.
"The main advantage of the MSER based method is that such algorithm is able to find most legible characters even when the document image is in low quality."
Another paper discusses about MSER. I'm having a hard time understanding the latter paper. Can anyone explain to me in simple terms the steps that I should take to implement MSER
and extract character patches in my sample document. I will implement it in Python, and I need to fully grasp / understand how MSER works.
Below are the steps to identify character patches in the image document (based from the way I understand it, please correct me if I am wrong)
"First, pixels are sorted by intensity"
My comprehension:
Say for example I have 5 pixels in an image with intensities
(Pixel 1) 1, (Pixel 2) 9,(Pixel 3) 255,(Pixel 4) 3,(Pixel 5) 4
consecutively, then if sorted increasingly, based on intensity it will yield an output,Pixel 1,4,5,2 and 3
.After sorting, pixels are placed in the image (either in decreasing or increasing order) and the list of connected components and their areas is maintained using the efficient union-find algorithm.
My Comprehension:
Using the example in number 1. Pixels will be arranged like below. Pixel component/group and Image X,Y coordinates are just examples.
Pixel Number | Intensity Level | Pixel Component/Group | Image X,Y Coordinates 1 | 1 | Pixel Component # 5 | (14,12) 4 | 3 | Pixel Component # 1 | (234,213) 5 | 4 | Pixel Component # 2 | (231,14) 2 | 9 | Pixel Component # 3 | (23,21) 3 | 255 | Pixel Component # 1 | (234,214)
"The process produces a data structure storing the area of each connected component as a function of intensity."
My comprehension:
A column in table in #2 will be added, called
Area
. It will count the number of pixels in a specific component with the same intensity level. Its like an aggregation of pixels within the component group with the same intensity level.4."Finally, intensity levels that are local minima of the rate of change of the area function are selected as thresholds producing MSER. In the output, each MSER is represented by position of a local intensity minimum (or maximum) and a threshold."
How to get the local minima of the rate of change of the area function ?
Please help me understand this what and how to implement MSER. Feel free to correct my understanding. Thanks.