Google Vision OCR data form

Question

I'm exploring the Google Vision API for OCR. We have lots of forms that are computer generated and filled by users. Like the Medical Reports and Registration Forms. We need to process those images and get the character out of it. I've tried Google Vision API and its works great in case of computer generated form, but the ones filled by hand are creating issues. Like If fill the form with the data a little above the y axis the words is considered as previous/next line. Like below is the output

Study Contact Name:
Test

expected

Study Contact Name: Test

The Form used

Code reference: https://cloud.google.com/vision/docs/detecting-text#vision-text-detection-java

Is there a way to get this in one line, or understand if its part of that line?

Any other API that can help in this scenario?

Lethos Lethos · Accepted Answer · 2018-09-14T08:09:06

"Any other API that can help in this scenario", if you mean OCR API, I do not think any perform well with handwritten document, or at least no significantly better than Google.

Anyway, a possible method, that I use personally, is to create your own method to affect a line to letters / words.

This way, you can control how much distance can be considered as the same "line" between words.

Google API give you X and Y position information for each letter recognized. So you can simply iterate over all letters or words and include them in the same line if they are >= or <= to (2 pixel for exemple) of the Y position.

Google Vision OCR data form

2 Answers