I am using NAudio for my audio needs, but I've run into a thorny issue. I have a remote device that can receive RTP audio. I would like to stream an audio file to that device (after u-law or similar encoding + RTP wrapping). However, there doesn't seem to be a mechanism to maintain the outgoing timing for the RTP packets.
For example, a WaveOut player "manages" timing by simply responding to requests from the underlying sound/directx layers. In this manner, the timing is actually maintained by the sound drivers using a "pull" method.
What I'm looking for is a component that can provide the correct "pull" timing on an (e.g.) IWaveProvider
(or similar) so that I can take each packet, RTP-ify it, and send it over the wire.
So, here's the core code:
IPEndPoint target = new IPEndPoint(addr, port);
Socket sender = new Socket( AddressFamily.InterNetwork,
SocketType.Dgram,
ProtocolType.Udp );
IWaveProvider provider = new AudioFileReader(filename);
MuLawChatCodec codec = new MuLawChatCodec(); // <-- from chat example
int alignment = provider.BlockAlign * 32; // <-- arbitrary
byte [] buffer = new byte [alignment];
try
{
int numbytes;
while( (numbytes = provider.Read(buffer, 0, alignment)) != 0 )
{
byte [] encoded = m_Codec.Encode(buffer, 0, numbytes);
m_Sender.SendTo(buffer, numbytes, SocketFlags.None, target);
}
}
catch( Exception )
{
// We just assume that any exception is an exit signal.
}
What happens is that the while
loop just grabs "audio" as fast as it can and blows it out the UDP port. This won't work for RTP, since we need to maintain the proper output timing.
As a test, I tried a WaveOut with a NotifyingSampleProvider
, feeding each L/R pair to the encoder/RTP-ifier/sender, and it seemed to work fine. However, the side effect of the audio playing out of the local speaker (via WaveOut) is not acceptable for the application I'm working on (e.g. we may want to stream multiple different files to different devices simultaneously). We also might be using the audio hardware for (e.g.) simultaneous softphone converstations. Basically, we don't want to actually use the local audio hardware in this implementation.
So, does anyone know of (or written) a component that can provide the proper timing for the sender side of things? Something that can grab audio at the proper rate so that I can feed the encoder/sender chain?