2
votes

I'm exploring the Google Vision API for OCR. We have lots of forms that are computer generated and filled by users. Like the Medical Reports and Registration Forms. We need to process those images and get the character out of it. I've tried Google Vision API and its works great in case of computer generated form, but the ones filled by hand are creating issues. Like If fill the form with the data a little above the y axis the words is considered as previous/next line. Like below is the output

Study Contact Name:
Test

expected

Study Contact Name: Test

The Form used

Code reference: https://cloud.google.com/vision/docs/detecting-text#vision-text-detection-java

Is there a way to get this in one line, or understand if its part of that line?

Any other API that can help in this scenario?

2

2 Answers

0
votes

"Any other API that can help in this scenario", if you mean OCR API, I do not think any perform well with handwritten document, or at least no significantly better than Google.

Anyway, a possible method, that I use personally, is to create your own method to affect a line to letters / words.

This way, you can control how much distance can be considered as the same "line" between words.

Google API give you X and Y position information for each letter recognized. So you can simply iterate over all letters or words and include them in the same line if they are >= or <= to (2 pixel for exemple) of the Y position.

0
votes

I am probably too late for you, but since I got here with a similar question, I'll share what I have found:

  1. Google's API is much better now a days recognizing hand-written forms. At least in my tests, it works fine: Google Vision API. The problem is identifying the structure of the form. I don't know how to tell Google's API to look for a table, or to look for specific fields.
  2. I have found a promising service you might also be interested in: Azure Form recognizer