1
votes

My task is to identify character patches within the document image. Consider the image below:

enter image description here

Based from the paper, to extract character patches, the MSER based method will be adopted to detect character candidates.

"The main advantage of the MSER based method is that such algorithm is able to find most legible characters even when the document image is in low quality."

Another paper discusses about MSER. I'm having a hard time understanding the latter paper. Can anyone explain to me in simple terms the steps that I should take to implement MSER and extract character patches in my sample document. I will implement it in Python, and I need to fully grasp / understand how MSER works.

Below are the steps to identify character patches in the image document (based from the way I understand it, please correct me if I am wrong)

  1. "First, pixels are sorted by intensity"

    My comprehension:

    Say for example I have 5 pixels in an image with intensities (Pixel 1) 1, (Pixel 2) 9,(Pixel 3) 255,(Pixel 4) 3,(Pixel 5) 4 consecutively, then if sorted increasingly, based on intensity it will yield an output, Pixel 1,4,5,2 and 3.

  2. After sorting, pixels are placed in the image (either in decreasing or increasing order) and the list of connected components and their areas is maintained using the efficient union-find algorithm.

    My Comprehension:

    Using the example in number 1. Pixels will be arranged like below. Pixel component/group and Image X,Y coordinates are just examples.

     Pixel Number | Intensity Level | Pixel Component/Group | Image X,Y Coordinates
          1       |        1        |  Pixel Component # 5 | (14,12)
          4       |        3        |  Pixel Component # 1 | (234,213)
          5       |        4        |  Pixel Component # 2 | (231,14)
          2       |        9        |  Pixel Component # 3 | (23,21)
          3       |      255        |  Pixel Component # 1 | (234,214)
    
  3. "The process produces a data structure storing the area of each connected component as a function of intensity."

    My comprehension:

    A column in table in #2 will be added, called Area. It will count the number of pixels in a specific component with the same intensity level. Its like an aggregation of pixels within the component group with the same intensity level.

    4."Finally, intensity levels that are local minima of the rate of change of the area function are selected as thresholds producing MSER. In the output, each MSER is represented by position of a local intensity minimum (or maximum) and a threshold."

How to get the local minima of the rate of change of the area function ?

Please help me understand this what and how to implement MSER. Feel free to correct my understanding. Thanks.

1

1 Answers

1
votes

In one article the authors track a value they call "stability" (which roughly means the rate of change of area when going from region to region in their data structure), and then find regions corresponding to local minima of that value (a local minimum is a point in which the value of interest is smaller than that in the closest neighbors). If that is of any help, here is a C++ implementation of MSER (based on another article).