1
votes

I am using google API for speech to text.

below is my python code:

from google.cloud import speech_v1p1beta1 as speech

import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="C:\\Users\\chetan.patil\\Speech Recognition-db71b5de7c80.json" #Specified key

client=speech.SpeechClient()

speech_file="Chetan_Recording_20Secflac.flac" #import file

with open(speech_file,'rb') as audio_file:
    content=audio_file.read()
    audio=speech.types.RecognitionAudio(content=content)

config=speech.types.RecognitionConfig(encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
                                      language_code='en_US',enable_speaker_diarization=True,audio_channel_count=1,
                                      sample_rate_hertz=44100)

response = client.recognize(config, audio)

When i run the last code of line. It gives error as "400 Specify FLAC encoding to match file header"

Even i tried with .wav file then its giving error as "400 Must use single channel (mono) audio, but WAV header indicates 2 channels"

Can anyone please help me on this?

2

2 Answers

1
votes

Removing the entire encoding configuration also seems to work. I mean dropping the encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16 from the config settings since this can be inferred from the headers of the audio file.

0
votes

When i run the last code of line. It gives error as "400 Specify FLAC encoding to match file header"

You need speech.enums.RecognitionConfig.AudioEncoding.FLAC to process FLAC files

Even i tried with .wav file then its giving error as "400 Must use single channel (mono) audio, but WAV header indicates 2 channels"

The wav file should be mono indeed, looks like you tried a stereo file.