4
votes

For a VoIP speech quality monitoring application I need to compare an incoming RTP audio stream to a reference signal. For the signal comparison itself I use pre-existing, special-purpose tools. For the other parts (except packet capture) the Gstreamer library seemed to be a good choice. I use the following pipeline to simulate a bare-bones VoIP client:

filesrc location=foobar.pcap ! pcapparse ! "application/x-rtp, payload=0, clock-rate=8000"
  ! gstrtpjitterbuffer ! rtppcmudepay ! mulawdec ! audioconvert
  ! audioresample ! wavenc ! filesink location=foobar.wav

The pcap file contains a single RTP media stream. I crafted a capture file that's missing 50 of the original 400 UDP datagrams. For the given audio sample (8s long for my example):

[XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX]

with a certain amount of consecutive packet loss I'd expect an audio signal like this to be output ('-' denotes silence):

[XXXXXXXXXXXXXXXXXXXXXXXX-----XXXXXXXXXXX]

however what is actually saved in the audio file is this (1s shorter for my example):

[XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX]

It seems that the jitter buffer, a crucial part for this application, is not working properly. Could this be an incompatibility with / a shortcoming of the pcapparse element? Am I missing a key part in the pipeline to ensure time synchronization? What else could be causing this?

2

2 Answers

2
votes

GStreamer may just use the dejitter buffer to smooth out the packets on the way to the (audio) output. This wouldn't be unusual, its the bare minimum definition of dejittering.

It may go so far as reordering out-of-order packets or deleting duplicates, but packet loss concealment (your scenario) can be quite complex.

Basic implementations just duplicate the last received packet, whilst more advanced implementation analyse and reconstruct the tone of the last received packets to smooth out the audio.

It sounds like your application performance will depend on the exact implementation of loss concealment, so even if GStreamer does do "something", you may have a hard time quantifying its impact on your results unless you understand it in great detail.

Perhaps you could try a pcap with a couple of out-of-order and duplicate packets and check if GStreamer at least reorders/deletes them, that would go someway to clarifying what is happening.

2
votes

The issue could be solved by slightly altering the pipeline: An audiorate element needs to be added before wavenc that "produces a perfect stream by inserting or dropping samples as needed". However this works only if audiorate receives the packet-loss events. For this the do-lost property of gstjitterbuffer needs to be set to true.

Here's the corrected pipeline:

filesrc location=foobar.pcap ! pcapparse
  ! "application/x-rtp, payload=0, clock-rate=8000"
  ! gstrtpjitterbuffer do-lost=true ! rtppcmudepay ! mulawdec
  ! audioconvert ! audioresample ! audiorate ! wavenc
  ! filesink location=foobar.wav