I'm trying to use Microsoft's Computer Vision OCR API to get information from a table on an image. The trouble that I'm having is that the data returned typically has all sorts of qwerky regions going on and I'm attempting to piece all the regions together to get full lines of readable and parse-able text.
The only way I've thought of that makes any sense is to use the orientation to rotate the bounding box coordinates and check which "lines" are within a given percentage of the height of another given bounding box - perhaps 20% or so.
This is literally the only way I've thought of so far and I'm beginning to think I'm over complicating this; is there a standard way that people tend to build up OCR regions to get readable text?