Differences between Google Cloud Vision OCR in browser demo and via python

Question

I am fairly new to the Google Cloud Vision API so my apologies if there is an obvious answer to this. I am noticing that for some images I am getting different OCR results between the Google Cloud Vision API Drag and Drop (https://cloud.google.com/vision/docs/drag-and-drop) and from local image detection in python.

My code is as follows

import io
# Imports the Google Cloud client library
from google.cloud import vision
from google.cloud.vision import types

# Instantiates a client
client = vision.ImageAnnotatorClient()

# The name of the image file to annotate
file_name = "./test0004a.jpg"

# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
    content = image_file.read()

image = types.Image(content=content)

response = client.text_detection(image=image)
texts = response.text_annotations

print('Texts:')
for text in texts:
#    print('\n"{}"'.format(text.description.encode('utf-8')))
    print('\n"{}"'.format(text.description.encode('ascii','ignore')))

    vertices = (['({},{})'.format(vertex.x, vertex.y)
                for vertex in text.bounding_poly.vertices])

    print('bounds: {}'.format(','.join(vertices)))

A sample image that highlights this is attached Sample Image

The python code above doesn't return anything, but in the browser using drag and drop it correctly identifies "2340" as the text. Shouldn't both python and the browser return the same result?. And if not, why not?, Do I need to include additional parameters in the code?.

dsesto dsesto · Accepted Answer · 2018-06-15T08:57:53

The issue here is that you are using TEXT_DETECTION instead of DOCUMENT_TEXT_DETECTION, which is the feature being used in the Drag and Drop example page that you shared.

By changing the method (to document_text_detection()), you should obtain the desired results (I have tested it with your code, and it did work):

# Using TEXT_DETECTION
response = client.text_detection(image=image)

# Using DOCUMENT_TEXT_DETECTION
response = client.document_text_detection(image=image)

Although both methods can be used for OCR, as presented in the documentation, DOCUMENT_TEXT_DETECTION is optimized for dense text and documents. The image you shared is not a really high-quality one, and the text is not clear, therefore it may be that for this type of images, DOCUMENT_TEXT_DETECTION offers a better performance than TEXT_DETECTION.

See some other examples where DOCUMENT_TEXT_DETECTION worked better than TEXT_DETECTION. In any case, please note that it might not always be the situation, and TEXT_DETECTION may still have better results under certain conditions:

Differences between Google Cloud Vision OCR in browser demo and via python

1 Answers