IBM Watson Text to Speech API Python

Question

I'm trying to adjust the pitch of IBM Watson but I can't seem to find any documentation on this whatsoever.

If you visit this link then you can see that there is an option to adjust the pitch/speed.

The code I have is very simply this:

from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator('api_key')
text_to_speech = TextToSpeechV1(
    authenticator=authenticator
)

text_to_speech.set_service_url('service_url')

sample = "insert what you want to say here"

with open('test.wav', 'wb') as audio_file:
    audio_file.write(
        text_to_speech.synthesize(
            sample,
            voice='en-GB_JamesV3Voice',
            accept='audio/wav'
        ).get_result().content)

I have literally no idea what parameters to adjust in order to make the voice low. Thank you so much!

AnonymouseUser AnonymouseUser · Accepted Answer · 2021-01-03T22:17:29

What you are looking for is the prosody element. Neural voices (V3) only use the pitch and rate attribute.

Using your example:

sample = 'Here is a <prosody pitch="150Hz"> modified pitch </prosody> example.'

sample = 'Here is a <prosody rate="x-slow"> modified rate </prosody> example.'

IBM Watson Text to Speech API Python

2 Answers