Microsoft Computer Vision OCR - Piecing Together Lines From Regions

Question

I'm trying to use Microsoft's Computer Vision OCR API to get information from a table on an image. The trouble that I'm having is that the data returned typically has all sorts of qwerky regions going on and I'm attempting to piece all the regions together to get full lines of readable and parse-able text.

The only way I've thought of that makes any sense is to use the orientation to rotate the bounding box coordinates and check which "lines" are within a given percentage of the height of another given bounding box - perhaps 20% or so.

This is literally the only way I've thought of so far and I'm beginning to think I'm over complicating this; is there a standard way that people tend to build up OCR regions to get readable text?

Nikhil Bahrani Nikhil Bahrani · Accepted Answer · 2019-04-22T09:03:07

There is no standard way as such. However, people do go with the option of REGEX, depending on the requirement. Azure OCR returns the JSON Response as words and their bounding boxes. From there on, it is up to you to interpret the result. The ocr apis do not help with this task.

As a start, regex is a great way to parse text data. Or try a machine learning approach as described in this reddit post: https://www.reddit.com/r/MachineLearning/comments/53ovp9/extracting_a_total_cost_from_ocr_paper_receipt/

Microsoft Computer Vision OCR - Piecing Together Lines From Regions

1 Answers