1
votes

I am using NAudio for my audio needs, but I've run into a thorny issue. I have a remote device that can receive RTP audio. I would like to stream an audio file to that device (after u-law or similar encoding + RTP wrapping). However, there doesn't seem to be a mechanism to maintain the outgoing timing for the RTP packets.

For example, a WaveOut player "manages" timing by simply responding to requests from the underlying sound/directx layers. In this manner, the timing is actually maintained by the sound drivers using a "pull" method.

What I'm looking for is a component that can provide the correct "pull" timing on an (e.g.) IWaveProvider (or similar) so that I can take each packet, RTP-ify it, and send it over the wire.

So, here's the core code:

IPEndPoint target = new IPEndPoint(addr, port);
Socket sender = new Socket( AddressFamily.InterNetwork,
                            SocketType.Dgram,
                            ProtocolType.Udp );

IWaveProvider provider = new AudioFileReader(filename);
MuLawChatCodec codec = new MuLawChatCodec();  // <-- from chat example
int alignment = provider.BlockAlign * 32;  // <-- arbitrary
byte [] buffer = new byte [alignment];

try
{
    int numbytes;
    while( (numbytes = provider.Read(buffer, 0, alignment)) != 0 )
    {
        byte [] encoded = m_Codec.Encode(buffer, 0, numbytes);
        m_Sender.SendTo(buffer, numbytes, SocketFlags.None, target);
    }
}
catch( Exception )
{
    // We just assume that any exception is an exit signal.
}

What happens is that the while loop just grabs "audio" as fast as it can and blows it out the UDP port. This won't work for RTP, since we need to maintain the proper output timing.

As a test, I tried a WaveOut with a NotifyingSampleProvider, feeding each L/R pair to the encoder/RTP-ifier/sender, and it seemed to work fine. However, the side effect of the audio playing out of the local speaker (via WaveOut) is not acceptable for the application I'm working on (e.g. we may want to stream multiple different files to different devices simultaneously). We also might be using the audio hardware for (e.g.) simultaneous softphone converstations. Basically, we don't want to actually use the local audio hardware in this implementation.

So, does anyone know of (or written) a component that can provide the proper timing for the sender side of things? Something that can grab audio at the proper rate so that I can feed the encoder/sender chain?

2

2 Answers

3
votes

In case anyone is interested, I got it to work. I had to add a few components, and refine the timing down to a very fine level.

Added:

  • Now using NAudio.Wave.Mp3FileReader. I uses this because it provides timing information with each frame that you read.
  • Configurable 'buffering' (it buffers frame times, not actual audio)
  • Managing timing more precisely with System.Diagnostics.Stopwatch and Socket.Poll

Here's a condensed (single-threaded) version of the code with no try/catches, etc. The actual stuff uses a player thread and some other synch mechanisms, and lets you 'cancel' playing by killing the socket.

// Target endpoint.  Use the default Barix UDP 'high priority'
// port.
IPEndPoint target = new IPEndPoint( IPAddress.Parse("192.168.1.100"), 3030 );

// Create reader...
NAudio.Wave.Mp3FileReader reader = new Mp3FileReader("hello.mp3");

// Build a simple udp-socket for sending.
Socket sender = new Socket( AddressFamily.InterNetwork,
                            SocketType.Dgram,
                            ProtocolType.Udp );

// Now for some 'constants.'
double ticksperms = (double)Stopwatch.Frequency;
ticksperms /= 1000.0;

// We manage 'buffering' by just accumulating a linked list of
// mp3 frame times.  The maximum time is defined by our buffer
// time.  For this example, use a 2-second 'buffer'.
// 'framebufferticks' tracks the total time in the buffer.
double framebuffermaxticks = ticksperms * 2000.0;
LinkedList<double> framebuffer = new LinkedList<double>();
double framebufferticks = 0.0f;

// We'll track the total mp3 time in ticks.  We'll also need a
// stopwatch for the internal timing.
double expectedticks = 0.0;
Stopwatch sw = new Stopwatch();
long startticks = Stopwatch.GetTimestamp();

// Now we just read frames until a null is returned.
int totalbytes = 0;
Mp3Frame frame;
while( (frame = reader.ReadNextFrame()) != null )
{
    // Make sure the frame buffer is valid.  If not, we'll
    // quit sending.
    byte [] rawdata = frame.RawData;
    if( rawdata == null ) break;

    // Send the complete frame.
    totalbytes += rawdata.Length;
    sender.SendTo(rawdata, target);

    // Timing is next.  Get the current total time and calculate
    // this frame.  We'll also need to calculate the delta for
    // later.
    double expectedms = reader.CurrentTime.TotalMilliseconds;
    double newexpectedticks = expectedms * ticksperms;
    double deltaticks = newexpectedticks - expectedticks;
    expectedticks = newexpectedticks;

    // Add the frame time to the list (and adjust the total
    // frame buffer time).  If we haven't exceeded our buffer
    // time, then just go get the next packet.
    framebuffer.AddLast(deltaticks);
    framebufferticks += deltaticks;
    if( framebufferticks < framebuffermaxticks ) continue;

    // Pop one delay out of the queue and adjust values.
    double framedelayticks = framebuffer.First.Value;
    framebuffer.RemoveFirst();
    framebufferticks -= framedelayticks;

    // Now we just wait....
    double lastelapsed = 0.0f;
    sw.Reset();
    sw.Start();
    while( lastelapsed < framedelayticks )
    {
        // We do short burst delays with Socket.Poll() because it
        // supports a much higher timing precision than
        // Thread.Sleep().
        sender.Poll(100, SelectMode.SelectError);
        lastelapsed = (double)sw.ElapsedTicks;
    }

    // We most likely waited more ticks than we expected.  Timewise,
    // this isn't a lot.  But it could cause accumulate into a large
    // 'drift' if this is a very large file.  We lower the total
    // buffer/queue tick total by the overage.
    if( lastelapsed > framedelayticks )
    {
        framebufferticks -= (lastelapsed - framedelayticks);
    }
}

// Done sending the file.  Now we'll just do one final wait to let
// our 'buffer' empty.  The total time is our frame buffer ticks
// plus any latency.
double elapsed = 0.0f;
sw.Reset();
sw.Start();
while( elapsed < framebufferticks )
{
    sender.Poll(1000, SelectMode.SelectError);
    elapsed = (double)sw.ElapsedTicks;
}

// Dump some final timing information:
double diffticks = (double)(Stopwatch.GetTimestamp() - startticks);
double diffms = diffticks / ticksperms;
Console.WriteLine("Sent {0} byte(s) in {1}ms (filetime={2}ms)",
                 totalbytes, diffms, reader.CurrentTime.TotalMilliseconds);
0
votes

Basically you need to generate a clock to drive sample delivery. The AudioFileReader inherits from WaveStream which provides the current time for the samples via CurrentTime (calculated off of WaveFormat.AverageBytesPerSecond). You should be able to use this to timestamp the audio packets.

Also, depending of your project needs you may want to look at StreamCoder's MediaSuite.NET product. I've used it in the past to do similar things, it abstracts out much of the transport level complexity for this sort of thing.