How to use Google's Text-to-Speech API in Python

Question

My key is ready to go to make requests and get speech from text from Google.
I tried these commands and many more.
The docs offer no straight forward solutions to getting started with Python that I've found. I don't know where my API key goes along with the JSON and URL

One solution in their docs here is for CURL.. But involves downloading a txt after the request that has to be sent back to them in order to get the file. Is there a way to do this in Python that doesn't involve the txt I have to return them? I just want my list of strings returned as audio files.

(I put my actual key in the block above. I'm just not going to share it here.)

I saw this. I meant in the OP, there is no Python equivalent in the link I posted. I don't understand what this link is, (the code). I don't understand where my API key goes. Maybe that's all I need. Where does this code see the API? I haven't been able to find a way into using this stuff anywhere after looking all day. — Renoldus
When you say API Key I assume you mean the API key you set up when setting up Google Cloud correct? It might be worth reading the full set up from the beginning. The API key you downloaded in JSON format is something you set in your environment as GOOGLE_APPLICATION_CREDENTIALS (see step 2). They then have further instructions in how you get your Python environment set up correctly. — aug
As aug pointed out, there is a Python quickstart at the link that they provided. The Python quickstart provides equivalent functionality to the CURL sample that you linked to. Also as aug mentions, you need to use a service account, not an API key. — Eric Schmidt
Eric, you write these docs? Respectfully, They are very opaque and confusing. Hard to find. It's like there are 3 decoy versions of everything. Not linked to where I signed up. Yesterday was 10 hours trying to get Python to do what that CURL command did. Today, about 8 hours spent trying to figure out where to enter the voice name and that it was dif than language_code. You downvoted my question? — Renoldus

CodeRaptor CodeRaptor · Accepted Answer · 2019-02-15T02:08:23

Configure Python App for JSON file and Install Client Library

Create a Service Account
Create a Service Account Key using the Service Account here
The JSON file downloads and save it securely
Include the Google Application Credentials in your Python App
Install the library: pip install --upgrade google-cloud-texttospeech

Using Google's Python examples found: https://cloud.google.com/text-to-speech/docs/reference/libraries Note: In Google's example it is not including the name parameter correctly. and https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/texttospeech/cloud-client/quickstart.py

Below is the modified from the example using google app credentials and wavenet voice of a female.

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/yourproject-12345.json"

from google.cloud import texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Do no evil!")

# Build the voice request, select the language code ("en-US") 
# ****** the NAME
# and the ssml voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
    language_code='en-US',
    name='en-US-Wavenet-C',
    ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)

# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
    audio_encoding=texttospeech.enums.AudioEncoding.MP3)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)

# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

Voices,Name, Language Code, SSML Gender, Etc

List of Voices: https://cloud.google.com/text-to-speech/docs/voices

In the above code example I changed the voice from Google's example code to include the name parameter and to use the Wavenet voice (much improved but more expensive $16/million chars) and the SSML Gender to FEMALE.

voice = texttospeech.types.VoiceSelectionParams(
        language_code='en-US',
        name='en-US-Wavenet-C',
        ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)

How to use Google's Text-to-Speech API in Python

3 Answers

Configure Python App for JSON file and Install Client Library

Below is the modified from the example using google app credentials and wavenet voice of a female.

Voices,Name, Language Code, SSML Gender, Etc