My goal is to process several videos using a speech-to-text model.
Google confusingly has two products that seem to do the same thing.
What are the major differences between these offering?
Google Cloud Speech-to-Text: https://cloud.google.com/speech-to-text/docs/basics
- Speech-to-Text has an "enhanced video" model for interpreting the audio.
Google Video Intelligence: https://cloud.google.com/video-intelligence/docs/feature-speech-transcription
- VI has the option to request a
SPEECH_TRANSCRIPTION
feature
- VI has the option to request a