AWS Rekognition -- How to Parse All the Text in an Image?

Question

I am trying to parse text in an image of a restaurant bill. I've been able to set up the ruby AWS SDK which has the Rekognition client using this example. Moreover, locally I have been able to make a call to Rekognition, passing an image locally.

When I make the call with #detect_text (docs), I get a response and the response has TextDetections which represent either lines or words in the image. I would like however that response to only contain TextDetections of type LINE. Here are my questions:

Is it possible to get a response back that only contains TextDetections of type LINE?
Is it possible to increase the limit of words detected in an image? Apparently according to the docs:

DetectText can detect up to 50 words in an image That sounds like a hard limit to me.

Is there a way I can get around the limit of 50 words in an image? Perhaps I can make multiple calls on the same image where Rekognition can parse the same image multiple times until it has all the words?

From my knowledge: No you can't limit the response, No you can't increase the limit, No, but cropping and parsing multiple times is a common method used. You might be better-off using an OCR library rather than Rekognition, since Rekognition's primary purpose is detecting objects rather than text. — John Rotenstein

Vamsi Vutukuru Vamsi Vutukuru · Accepted Answer · 2020-04-01T19:03:27

Yes. You cannot detect more than 50 words in an image. A workaround is to crop the image into multiple images, and run DetectText on each cropped image.

AWS Rekognition -- How to Parse All the Text in an Image?

1 Answers