Can OCR be performed on a video using Google Cloud Vision or Video intelligence API?

Question

I have used Google's Vision OCR a lot and it is really very accurate. I was wondering if I can do the OCR on a video file or video stream. Say, I have some surveillance video and I want to get all the text throughout that video. In Google's Video intelligence API, I can only get the labels, which I am guessing is using label detection API of Google Vision. I think there might be challenges to OCR on every frame of video, but still wanted to try to start a discussion on how it can be done. There might not be a perfect solution, but even if we get 50% of it, it's better than nothing.

The best result will be achieved if we reconstruct text-containing surfaces first by connecting parts of them that were captured in different frames. Then combining several shots of the same surface segment we can get rid of mpeg artifacts, etc. — Nakilon
@Nakilon Can you please elaborate a bit more? I only got it partially. How can I connect parts of it captured in different frames? I can only get text containing parts. But tracking!? — Anmol Agrawal
You would need to find a solution for that. Google does not provide such service. — Nakilon

Victor M Herasme Perez Victor M Herasme Perez · Accepted Answer · 2017-12-11T10:41:23

Here is what I did:

Go to this website an download this sample free Video: https://www.videvo.net/video/people-walking-past-the-911-memorial-sign-in-new-york/5283/
Download and install VLC video player
Follow the steps in this tutorial to extract the images from the video:

a. Go to tools -> preferences. In the lower left cofner click the radio button 'All'.

b. Click on the video category on the left in order to expand it. Click again in 'filters' in order to expand it.

c. Select the 'scene filter' and choose the settings (see the image below).

d. Click the filters category and select the 'Scene video filter' checkbox(See image below)

e. After clicking 'Save'in the lower right corner, open the video you downloaded and play it. The images will be saved automatically.

More details here.
Go to this CLOUD VISION API page, and you can drag and drop any of the generated images to see a sample of the API capabilities.

Can OCR be performed on a video using Google Cloud Vision or Video intelligence API?

3 Answers