0
votes

I am working on implementing RTP on an embedded MCU (STM32F4) and I'm having trouble with efficiently streaming audio data (8 kHz, u-law encoded).

For chunking audio data (20ms, 160 bytes) should I:

  1. Send a single RTP packet (12 byte header + 160 bytes of audio) over a single UDP datagram, or
  2. Send multiple RTP packets over a single UDP datagram (since we can fit multiple within a single UDP MTU)

If (2), then should there be a RTP header for each 160 bytes of audio data within the single UDP datagram. For example, 5 RTP packets would be 800 bytes of audio data - would I send:

  • RTP_Header->Audio_data(160 bytes)->RTP_Header->Audio_data(160 bytes)->RTP_Header->Audio_data(160 bytes)...
  • RTP_Header->Audio_data (800 bytes)

Using LinPhone as a client for testing, I am noticing multiple Out Of Time Packets and a short delay from when I speak into my embedded device to when I hear it on Linphone; and I'm trying to track down if more efficiently streaming data over UDP will fix it. I do not have the same delay when speaking into LinPhone and playing out of my embedded device, and the delay between the two is proving difficult for echo cancellation on the embedded MCU.

1
please clarify: are you getting Out Of Time packets and audio delay with implementation 1 or 2? I would not expect implementation 2 to work at all.1.618
With implementation 1. The voice delay seems resolved by switching to a different networking stack (NetX instead of LwIP). Still getting some Out of Time packets, but not nearly as much. You're right, implementation 2 did not work.Sean

1 Answers

1
votes

Given that RTP is for Real Time data and that each RTP payload is for a specific time it makes no sense to combine multiple RTP data (which are from different times) together into the same UDP packet. This means each RTP payload is prefixed by a RTP header which is then send immediately via UDP.