1
votes

I am using AWS Rekognition to detect text from a pdf that is converted into a jpeg. The image that I am using has text that is approximately size 10-12 or a regular letter page. However, The font changes throughout the image several times.

Is my lack of detection and low confidence levels due to having a document where the text changes often? Small Font?

Essentially I'd like to know what kind of image/text do I need to have the best results from a detect text algorithm?

1

1 Answers

1
votes

this is the official documentation snapshot

DetectText API can detect up to 50 words in an image

and to be detected, text must be within +/- 30 degrees orientation of the horizontal axis.

and you are trying to extract a page full of text, that's the problem :)

AWS now provides AWS Textract service that is specifically intended for OCR purposes from images and documents.