4
votes

I've ran in to an issue concerning generating floating point coordinates from an image.

The original problem is as follows: the input image is handwritten text. From this I want to generate a set of points (just x,y coordinates) that make up the individual characters.

At first I used findContours in order to generate the points. Since this finds the edges of the characters it first needs to be ran through a thinning algorithm, since I'm not interested in the shape of the characters, only the lines or as in this case, points.

Input:

input

thinning:

thinning

So, I run my input through the thinning algorithm and all is fine, output looks good. Running findContours on this however does not work out so good, it skips a lot of stuff and I end up with something unusable.

The second idea was to generate bounding boxes (with findContours), use these bounding boxes to grab the characters from the thinning process and grab all none-white pixel indices as "points" and offset them by the bounding box position. This generates even worse output, and seems like a bad method.

Horrible code for this:

Mat temp = new Mat(edges, bb);
byte roi_buff[] = new byte[(int) (temp.total() * temp.channels())];
temp.get(0, 0, roi_buff);

int COLS = temp.cols();
List<Point> preArrayList = new ArrayList<Point>();

for(int i  = 0; i < roi_buff.length; i++)
{
    if(roi_buff[i] != 0)
    {
            Point tempP = bb.tl();
            tempP.x += i%COLS;
            tempP.y += i/COLS;
            preArrayList.add(tempP);
    }
}

Is there any alternatives or am I overlooking something?

UPDATE:

I overlooked the fact that I need the points (pixels) to be ordered. In the method above I simply do scanline approach to grabbing all the pixels. If you look at the 'o' for example, it would grab first the point on the left hand side, then the one on the right hand side. I would need them to be ordered by their neighbouring pixels since I want to draw paths with the points later on (outside of opencv). Is this possible?

1
Updated the question with input and thinning result.prayforbacon

1 Answers

1
votes

You should look into implementing your own connected components labelling. The concept is very simple: you scan the first line and assign unique labels to each horizontally connected strip of pixels. You basically check for every pixel if it is connected to its left neighbour and assign it either that neighbour's label or a new label. In the second row you do the same, but you also check against the pixels above it. Sometimes you need a label merge: two strips that were not connected in the previous row are joined in the current row. The way to deal with this is either to keep a list of label equivalences or use pointers to labels (so you can easily do a complete label change for an object).

This is basically what findContours does, but if you implement it yourself you have the freedom to go for 8-connectedness and even bridge a single-pixel or two-pixel gap. That way you get "almost-connected components labelling". It looks like you need this for the "w" in your example picture.

Once you have the image labelled this way, you can push all the pixels of a single label to a vector, and order them something like this. Find the top left pixel, push it to a new vector and erase it from the original vector. Now find the pixel in the original vector closest to it, push it to the new vector and erase from the original. Continue until all pixels have been transferred.

It will not be very fast this way, but it should be a start.