0
votes

I have been using Google Cloud API OCR tool (https://cloud.google.com/functions/docs/tutorials/ocr) for one of my projects to extract text from a scanned image. The image is in .png format. I followed every instruction to install the cloud API for OCR. However, I do not see the result in the Result Bucket when I upload an image in Input Image Bucket in the cloud storage. However, I noticed that if I pass the parameter as "fr" or "es" in the following deploying function, the result does show up in the Result Bucket. I do not see any result for the "TO_LANG=en" parameter in the Result Bucket.

Deploying an image processing function with a Cloud Storage trigger:

gcloud functions deploy ocr-extract --runtime python37 --trigger-bucket etdimage --entry-point process_image --set-env-vars "TRANSLATE_TOPIC=extractData,TO_LANG=en”

But the result needs to be in English. Is there any way to work around this issue? I attached an image for your convenience. I would appreciate your help.

Thank you,

Muntabir Choudhury sample scanned document

1
I just followed the same tutorial with the only change of setting TO_LANG=en and uploaded the images provided in the repo and everything worked fine for me. Could you please check if your local copy of the repo code is up to date and retry the tutorial? – Happy-Monad
Hi Monad, Thanks for your response. I did use the updated code repo. It did work with the sample image which is provided in the Google Cloud API OCR Tool. The sample is in the French language and thus it was able to translate in English. However, does it work with English to English translation? Because my samples are scanned images in English. I saw Key Error in the logs, when I tired "TO_LANG=en", however, when I tried "TO_LANG=fr", it saved the file successfully in the output bucket. Any thoughts? I added a scanned image for your convenience in my post. – Muntabir Choudhury

1 Answers

1
votes

Thanks for sharing the image and the insightful comment.

I have tried to upload the image shared to the bucket and I acknowledged that no file was created in the results bucket. Moreover I could also see a KeyError exception in the logs that caused the ocr-extract function to crash.

After examining the repo code I found that in lines 54-70 when the source language is the same as the result language the function will directly send the result to the ocr-save function instead of calling ocr-translation. The issue is that the environment variable RESULT_TOPIC was not defined when the function was created which caused the crash.

To resolve it go to the functions tab of the Console UI and edit the function ocr-extract including the env var mentioned above. Alternatively you may redeploy the function with the Cloud SDK using this command:

gcloud functions deploy ocr-extract \
--runtime python37 \
--trigger-bucket YOUR_IMAGE_BUCKET_NAME \
--entry-point process_image \
--set-env-vars "^:^RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME:TRANSLATE_TOPIC=YOUR_TRANSLATE_TOPIC_NAME:TO_LANG=en"

I have requested an update for the tutorial documentation so that this situation is avoided in the future.