0
votes

I intend to use Google Cloud Speech Transcription for Video Intelligence. The following code only analysis for a partial segment of the video.

video_uri = "gs://cloudmleap/video/next/JaneGoodall.mp4"
language_code = "en-GB"
segment = types.VideoSegment()
segment.start_time_offset.FromSeconds(55)
segment.end_time_offset.FromSeconds(80)
response = transcribe_speech(video_uri, language_code, [segment])

def transcribe_speech(video_uri, language_code, segments=None):
    video_client = videointelligence.VideoIntelligenceServiceClient()
    features = [enums.Feature.SPEECH_TRANSCRIPTION]
    config = types.SpeechTranscriptionConfig(
        language_code=language_code,
        enable_automatic_punctuation=True,
    )
    context = types.VideoContext(
        segments=segments,
        speech_transcription_config=config,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )
    return operation.result()

How can I automatically analyse the whole video rather than defining a particular segment ?

1

1 Answers

1
votes

You can follow this tutorial in Video Intelligence google doc. This tutorial shows how to transcribe a whole video. Your input should be stored in a GCS bucket and I see that in your sample code, your video is indeed stored in a GCS bucket so you should not have any issues with this.

Just make sure that you have installed the latest Video Intelligence library.

pip install --upgrade google-cloud-videointelligence

Here is the the code snippet from the Video Intelligence doc for transcribing audio:

"""Transcribe speech from a video stored on GCS."""
from google.cloud import videointelligence

path="gs://your_gcs_bucket/your_video.mp4"
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.SPEECH_TRANSCRIPTION]

config = videointelligence.SpeechTranscriptionConfig(
    language_code="en-US", enable_automatic_punctuation=True
)
video_context = videointelligence.VideoContext(speech_transcription_config=config)

operation = video_client.annotate_video(
    request={
        "features": features,
        "input_uri": path,
        "video_context": video_context,
    }
)

print("\nProcessing video for speech transcription.")

result = operation.result(timeout=600)

# There is only one annotation_result since only
# one video is processed.
annotation_results = result.annotation_results[0]
for speech_transcription in annotation_results.speech_transcriptions:

    # The number of alternatives for each transcription is limited by
    # SpeechTranscriptionConfig.max_alternatives.
    # Each alternative is a different possible transcription
    # and has its own confidence score.
    for alternative in speech_transcription.alternatives:
        print("Alternative level information:")

        print("Transcript: {}".format(alternative.transcript))
        print("Confidence: {}\n".format(alternative.confidence))

        print("Word level information:")
        for word_info in alternative.words:
            word = word_info.word
            start_time = word_info.start_time
            end_time = word_info.end_time
            print(
                "\t{}s - {}s: {}".format(
                    start_time.seconds + start_time.microseconds * 1e-6,
                    end_time.seconds + end_time.microseconds * 1e-6,
                    word,
                )
            )