1
votes

I'm adding delay to an incoming audio-only WebRTC stream using the Web Audio API's DelayNode in Google Chrome.

connectionRecv.onaddstream = (event) => {
  const recvAudio = new Audio();
  recvAudio.srcObject = event.stream.clone();
  recvAudio.autoplay = true;

  recvAudio.onloadedmetadata = () => {

    // controls if original stream should also be played
    // true causes WebRTC getStats() receive track audioLevel == 0
    recvAudio.muted = muteOriginalStream;

    const recvAudioSource = audioContext.createMediaStreamSource(recvAudio.srcObject as MediaStream);
    const delayNode = audioContext.createDelay();
    delayNode.delayTime.value = 1; // delay by 1 second
    recvAudioSource.connect(delayNode);
    delayNode.connect(audioContext.destination);
  };
};

This is working with one small problem. I want to mute the original stream so that I don't hear double audio (original stream & delayed stream). But, when I mute the original stream so I only hear the delayed stream, the RTCPeerConnection getStats() returns 0 for the receive track audioLevel.

I've tried many different ways of piping around the received stream so I only hear the delayed version, but either I can't hear any audio at all or the getStats() audioLevel is 0. I'm looking for an implementation that preserves the non-zero audioLevel from WebRTC getStats(), while only the delayed stream is playing through the audio output device. To put it simply:

How can I play only the delayed stream without zeroing-out the WebRTC getStats() receive track audioLevel?

I've created a minimum reproduction of the issue on stackblitz here where I create a loopback WebRTC connection where the sender and receiver are the same browser. Inspect the console logs to see the receive track audioLevel returned from getStats().

Note: if you use my stackblitz, I suggest you use headphones to avoid a feedback loop.

1

1 Answers

1
votes

I don't have a solution for your stated problem, but as a workaround to get level information, you could add an AnalyserNode to your audio context and use either the time-domain or frequency-domain data it provides to compute an audio level yourself.