3
votes

I am interested in TEXT_DETECTION of Google Vision API, it works impressively. But it seems that TEXT_DETECTION only gives exactly result when the text is in English. In my case, i want to use TEXT_DETECTION in a quite narrow context, for example detection text on ads banners in specific language (in Vietnamese for my case). Can i train the machine on my own data collection to get more exactly result? And how to implement this?

Beside TEXT_DETECTION of Google Vision API, Google also has Google's Optical Character Recognition (OCR) software using dependencies of Tesseract. As i known, they have different algorithms to detect text. I used both Google Docs and TEXT_DETECTION of Google Vision API to read text (in Vietnamse) from a picture. Google Docs gave a good result but Vision API didn't. Why Google Vision API does not inherit advantages of Google OCR?

I want to say something more about Google Vision API Text Detection, maybe any Google Expert here and can read this. As Google announced, their TEXT_DETECTION was fantastic: "Even though the words in this image were slanted and unclear, the OCR extracts the words and their positions correctly. It even picks up the word "beacon" on the presenter's t-shirt". But for some of my pics, what happened was really funny. For example with this pic, even the words "Kem Oxit" are very big in center of pic, it was not recognized. Or in this pic, the red text "HOA CHAT NGOC VIET" in center of pic was not recognized too. There must be something wrong with the text detection algorithm.

2

2 Answers

2
votes

Did you experiment with LanguageHints (link to documentation)?

Vietnamese is in the list of supported languages, if the text is always in Vietnamese, this should improve the quality of text detection.

If this wouldn't help, you cannot improve the quality of text detection by giving it your own training examples.

1
votes

Fematich is right, it is currently not possible to train the TEXT_DETECTION feature of Google Vision API.

Concerning the Optical Character Recognition software, it is used in Google Vision API for TEXT_DETECTION according to this link. For better results, it would be important to verify if any of the best practices would apply to your picture. Google Docs may have a different pre-processing mechanism about which it would be interesting to ask on the Google Docs Help Forum.

Having a resolution of 375x500 pixels, the first image does not satisfy the minimum resolution requirement of 640x480 pixels as described in the best practices. Still, by rescaling it to 1024x1365 pixels the Google Vision API was able to detect the word “Oxit”. After rescaling the second image to the OCR recommended size of 1024x768 pixels for character recognition, again the API succeeded in detecting the words "HOA CHAT NGOC VIET”. Note that in the future this type of question would be more appropriate for Public Issue Tracker as it may require further details in order to reproduce your exact errors.