23
votes

I'm building a cross-platform web app where audio is generated on-the-fly on the server and live streamed to a browser client, probably via the HTML5 audio element. On the browser, I'll have Javascript-driven animations that must precisely sync with the played audio. "Precise" means that the audio and animation must be within a second of each other, and hopefully within 250ms (think lip-syncing). For various reasons, I can't do the audio and animation on the server and live-stream the resulting video.

Ideally, there would be little or no latency between the audio generation on the server and the audio playback on the browser, but my understanding is that latency will be difficult to control and probably in the 3-7 second range (browser-, environment-, network- and phase-of-the-moon-dependent). I can handle that, though, if I can precisely measure the actual latency on-the-fly so that my browser Javascript knows when to present the proper animated frame.

So, I need to precisely measure the latency between my handing audio to the streaming server (Icecast?), and the audio coming out of the speakers on the computer hosting the speaker. Some blue-sky possibilities:

  • Add metadata to the audio stream, and parse it from the playing audio (I understand this isn't possible using the standard audio element)

  • Add brief periods of pure silence to the audio, and then detect them on the browser (can audio elements yield the actual audio samples?)

  • Query the server and the browser as to the various buffer depths

  • Decode the streamed audio in Javascript and then grab the metadata

Any thoughts as to how I could do this?

3
I don't think there is a way to measure the difference, so I usually analyze the stream client-side in the browser using WebAudioAPI. Using that technique you can make animations (equalizer, for example) that will correlate closely with Icecast stream being played in the browser.Alex Paramonov

3 Answers

9
votes

Utilize timeupdate event of <audio> element, which is fired three to four times per second, to perform precise animations during streaming of media by checking .currentTime of <audio> element. Where animations or transitions can be started or stopped up to several times per second.

If available at browser, you can use fetch() to request audio resource, at .then() return response.body.getReader() which returns a ReadableStream of the resource; create a new MediaSource object, set <audio> or new Audio() .src to objectURL of the MediaSource; append first stream chunks at .read() chained .then() to sourceBuffer of MediaSource with .mode set to "sequence"; append remainder of chunks to sourceBuffer at sourceBuffer updateend events.

If fetch() response.body.getReader() is not available at browser, you can still use timeupdate or progress event of <audio> element to check .currentTime, start or stop animations or transitions at required second of streaming media playback.

Use canplay event of <audio> element to play media when stream has accumulated adequate buffers at MediaSource to proceed with playback.

You can use an object with properties set to numbers corresponding to .currentTime of <audio> where animation should occur, and values set to css property of element which should be animated to perform precise animations.

At javascript below, animations occur at every twenty second period, beginning at 0, and at every sixty seconds until the media playback has concluded.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">    
<head>
  <meta charset="utf-8" />
  <title></title>
  <style>
    body {
      width: 90vw;
      height: 90vh;
      background: #000;
      transition: background 1s;
    }

    span {
      font-family: Georgia;
      font-size: 36px;
      opacity: 0;
    }
  </style>
</head>

<body>
  <audio controls></audio>
  <br>
  <span></span>
  <script type="text/javascript">
    window.onload = function() {
      var url = "/path/to/audio";
      // given 240 seconds total duration of audio 
      // 240/12 = 20
      // properties correspond to `<audio>` `.currentTime`,
      // values correspond to color to set at element
      var colors = {
        0: "red",
        20: "blue",
        40: "green",
        60: "yellow",
        80: "orange",
        100: "purple",
        120: "violet",
        140: "brown",
        160: "tan",
        180: "gold",
        200: "sienna",
        220: "skyblue"
      };
      var body = document.querySelector("body");
      var mediaSource = new MediaSource;
      var audio = document.querySelector("audio");
      var span = document.querySelector("span");
      var color = window.getComputedStyle(body)
                  .getPropertyValue("background-color");
      //console.log(mediaSource.readyState); // closed
      var mimecodec = "audio/mpeg";

      audio.oncanplay = function() {
        this.play();
      }

      audio.ontimeupdate = function() {         
        // 240/12 = 20
        var curr = Math.round(this.currentTime);

        if (colors.hasOwnProperty(curr)) {
          // set `color` to `colors[curr]`
          color = colors[curr]
        }
        // animate `<span>` every 60 seconds
        if (curr % 60 === 0 && span.innerHTML === "") {
          var t = curr / 60;
          span.innerHTML = t + " minute" + (t === 1 ? "" : "s") 
                           + " of " + Math.round(this.duration) / 60 
                          + " minutes of audio";
          span.animate([{
              opacity: 0
            }, {
              opacity: 1
            }, {
              opacity: 0
            }], {
              duration: 2500,
              iterations: 1
            })
            .onfinish = function() {
              span.innerHTML = ""
            }
        }
        // change `background-color` of `body` every 20 seconds
        body.style.backgroundColor = color;
        console.log("current time:", curr
                   , "current background color:", color
                  , "duration:", this.duration);
      }
      // set `<audio>` `.src` to `mediaSource`
      audio.src = URL.createObjectURL(mediaSource);
      mediaSource.addEventListener("sourceopen", sourceOpen);

      function sourceOpen(event) {
        // if the media type is supported by `mediaSource`
        // fetch resource, begin stream read, 
        // append stream to `sourceBuffer`
        if (MediaSource.isTypeSupported(mimecodec)) {
          var sourceBuffer = mediaSource.addSourceBuffer(mimecodec);
          // set `sourceBuffer` `.mode` to `"sequence"`
          sourceBuffer.mode = "sequence";

          fetch(url)
          // return `ReadableStream` of `response`
          .then(response => response.body.getReader())
          .then(reader => {

            var processStream = (data) => {
              if (data.done) {
                  return;
              }
              // append chunk of stream to `sourceBuffer`
              sourceBuffer.appendBuffer(data.value);
            }
            // at `sourceBuffer` `updateend` call `reader.read()`,
            // to read next chunk of stream, append chunk to 
            // `sourceBuffer`
            sourceBuffer.addEventListener("updateend", function() {
              reader.read().then(processStream);
            });
            // start processing stream
            reader.read().then(processStream);
            // do stuff `reader` is closed, 
            // read of stream is complete
            return reader.closed.then(() => {
              // signal end of stream to `mediaSource`
              mediaSource.endOfStream();
              return  mediaSource.readyState;
            })
          })
          // do stuff when `reader.closed`, `mediaSource` stream ended
          .then(msg => console.log(msg))
        } 
        // if `mimecodec` is not supported by `MediaSource`  
        else {
          alert(mimecodec + " not supported");
        }
      };
    }
  </script>
</body>
</html>

plnkr http://plnkr.co/edit/fIm1Qp?p=preview

1
votes

There no way for you to measure latency directly, but any AudioElement generate events like 'playing' if it just played (fired quite often), or 'stalled' if stoped streaming, or 'waiting' if data is loading. So what you can do, is to manipulate your video based on this events.

So play while stalled or waiting is fired, then continue playing video if playing fired again.

But I advice you check other events that might affect your flow (error for example would be important for you).

https://developer.mozilla.org/en-US/docs/Web/API/HTMLAudioElement

0
votes

What i would try is first create a timestamp with performance.now, process the data, and record it in a blob with the new web recorder api.

The web recorder will ask user access to his audio card, this can be a problem for your app, but it look like mandatory to get the real latency.

As soon this done, there is many way to measure the actual latency between the generation and the actual rendering. Basically, a sound event.

For further reference and example:

Recorder demo

https://github.com/mdn/web-dictaphone/

https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder_API/Using_the_MediaRecorder_API