Text Recognition through AWS Rekognition Fails to Detect Majority of Text

Question

I am using AWS Rekognition to detect text from a pdf that is converted into a jpeg. The image that I am using has text that is approximately size 10-12 or a regular letter page. However, The font changes throughout the image several times.

Is my lack of detection and low confidence levels due to having a document where the text changes often? Small Font?

Essentially I'd like to know what kind of image/text do I need to have the best results from a detect text algorithm?

Mausam Sharma Mausam Sharma · Accepted Answer · 2018-05-17T13:20:16

DetectText API can detect up to 50 words in an image

and to be detected, text must be within +/- 30 degrees orientation of the horizontal axis.

and you are trying to extract a page full of text, that's the problem :)

AWS now provides AWS Textract service that is specifically intended for OCR purposes from images and documents.

Text Recognition through AWS Rekognition Fails to Detect Majority of Text

1 Answers