I want to do speech <-> text with mixed-language inputs.
Initially only Chinese & English, but eventually more language pairs. Vast majority of speech will be English, but small amounts of Chinese will be included. The application is kind of a "conversational verbal dictionary":
speech-to-text with mixed-language input: "How do you say 猫?"
text-to-speech with mixed-language input: "The English word for 猫 is Cat." I would want this to be spoken with the voice/accent of a native English speaker.
- I noticed that the text-to-speech demo at this URL can handle sentences like this IF you choose the "Chinese-CN", "Chinese-HK", or "Chinese-TW" accent, but not if you choose any of the "English-*" accents. This doesn't work for me because I need a native English-speaking accent ...