I have used Google Cloud Vision API for document text detection, but I could not figure out if it lets us define a particular area of image from which to extract text. For example if my image has 3 columns of text and I want to provide top-left coordinates, width and height of a particular column on which I want to perform OCR. Is it possible? Also is there any other way to not get jumbled up text when we have 3 columns of text in image?
2
votes
2 Answers
2
votes
Currently, It is not possible to define a particular area of image from which to extract text. There is no available parameter for that in the image context in neither the REST or gRPC APIs. A Possible workaround is to crop your image and send only the text you want to transcript. If you want to try to automate this process, perhaps the object localization or the crop hints features may be of use.
Regarding the jumbled up text, you may be able to locate each block or paragraph in the Json response.