I'm trying to get an audio stream from a text-to-speech interface (MaryTTS) and stream it in an SIP RTP session (using Peers).
Peers wants a SoundSource to stream audio, which is an interface defined as
public interface SoundSource {
byte[] readData();
}
and MaryTTS synthesises a String to an AudioInputStream. I tried to simply read the stream and buffering it out to Peers implementing SoundSource, in the lines of
MaryInterface tts = new LocalMaryInterface();
AudioInputStream audio = tts.generateAudio("This is a test.");
SoundSource soundSource = new SoundSource() {
@Override
public byte[] readData() {
try {
byte[] buffer = new byte[1024];
audio.read(buffer);
return buffer;
} catch (IOException e) {
return null;
}
}
};
// issue call with soundSource using Peers
the phone rings, and I hear a slow, low, noisy sound instead of the synthesised speech. I guess it could be something with the audio format the SIP RTP session expects, since Peers documentation states
The sound source must be raw audio with the following format: linear PCM 8kHz, 16 bits signed, mono-channel, little endian.
How can I convert/read the AudioInputStream to satisfy these requirements?