Voice chat between Node.js and browser (audio streams, VoIP)

Question

I have done voice chatting between two node.js servers before (see: tvoip), which works quite well, but now I would like to do it between a node.js server and a browser. How could this be done?
From node.js to node.js I simply used raw PCM streams over a TCP connection.
For the browser this is probably not going to be that easy, right? I mean the browser doesn't really offer a TCP API. It does offer a WebSocket API, but does it handle streams? Would I have to convert the streams and if so into what format and how? What protocol should I use? Are there any helpful libraries to accomplish this already? Is socket.io-stream a viable library to send these kinds of streams?

From what I understand the audio streams are in the PCM format on the browser. So it should be compatble with the streams I got in Node.js. Is that assumption correct?

I have managed to pipe the browser mic input to the browser speaker output like this:

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
</head>
<body>

<!-- alternative method that also works
<audio></audio>
<script>
navigator.mediaDevices.getUserMedia({ audio: true }).then(function(stream) {
    const audio = document.querySelector('audio')
    audio.srcObject = stream
    audio.onloadedmetadata = function(e) {
        audio.play()
    }
}).catch(console.error)
</script>
-->
<script>
    navigator.mediaDevices.getUserMedia({audio: true}).then(stream => {
        const aCtx = new AudioContext()
        const analyser = aCtx.createAnalyser()
        const microphone = aCtx.createMediaStreamSource(stream)
        microphone.connect(analyser)
        analyser.connect(aCtx.destination)
    }).catch(err => {
        console.error("Error getting audio stream from getUserMedia")
    })
</script>

</body>
</html>

As you can see I found two solutions. I will try to base the node<->browser voice chat on the second one.

For Node.js I came up with this code to pipe a node.js mic input to a node.js speaker output:

const mic = require('mic')
const Speaker = require('speaker')

const micInstance = mic({ // arecord -D hw:0,0 -f S16_LE -r 44100 -c 2
    device: 'hw:2,0',           //   -D hw:0,0
    encoding: 'signed-integer', //             -f S
    bitwidth: '16',             //                 16
    endian: 'little',           //                   _LE
    rate: '44100',              //                       -r 44100
    channels: '1',              //                                -c 2
    debug: true
})
const micInputStream = micInstance.getAudioStream()

const speakerInstance = new Speaker({ // | aplay -D plughw:CARD=0,DEV=0
    channels: 1,
    bitDepth: 16,
    sampleRate: 44100,
    signed: true,
    device: 'plughw:2,0' //'plughw:NVidia,7'
})
speakerInstance.on('open', ()=>{
    console.log("Speaker received stuff")
})

// Pipe the readable microphone stream to the writable speaker stream:
micInputStream.pipe(speakerInstance)

micInputStream.on('data', data => {
    //console.log("Recieved Input Stream: " + data.length)
})
micInputStream.on('error', err => {
    cosole.log("Error in Input Stream: " + err)
})
micInstance.start()

console.log('Started')

Finding the right device for mic and speaker can be a bit tricky if you are not familiar with ALSA under Linux. It is explained here in case you are unsure. I am not certain how it works on Windows and Mac OS with SoX.

I then came up with a small test application to connect the two ideas using socket.io-stream (a socket.io library that allows sending streams over a socket). And obviously, this is where I'm stuck at.

Basically, I try this on the node.js side:

const mic = require('mic')
const Speaker = require('speaker')
const SocketIO = require('socket.io')
const ss = require('socket.io-stream')

...

io.on('connection', socket => {
    let micInstance = mic(micConfig)
    let micInputStream = micInstance.getAudioStream()
    let speakerInstance = new Speaker(speakerConfig)

    ...

    ss(socket).on('client-connect', (stream, data) => { // stream: duplex stream
        stream.pipe(speakerInstance) //speakerInstance: writable stream
        micInputStream.pipe(stream) //micInputStream: readable stream
        micInstance.start()
    })
})

and this on the browser side:

const socket = io()
navigator.mediaDevices.getUserMedia({audio:true}).then(clientMicStream => { // Get microphone input
    // Create a duplex stream using the socket.io-stream library's ss.createStream() method and emit it it to the server
    const stream = ss.createStream() //stream: duplex stream
    ss(socket).emit('client-connect', stream)

    // Send microphone input to the server by piping it into the stream
    clientMicStream.pipe(stream) //clientMicStream: readable stream
    // Play audio received from the server through the stream
    const aCtx = new AudioContext()
    const analyser = aCtx.createAnalyser()
    const microphone = aCtx.createMediaStreamSource(stream)
    microphone.connect(analyser)
    analyser.connect(aCtx.destination)
}).catch(e => {
    console.error('Error capturing audio.')
    alert('Error capturing audio.')
})

The whole code can be viewed at: https://github.com/T-vK/node-browser-audio-stream-test
(The README.md contains instructions on how to set it up, if you want to test it.) The relevant code is in server.js (The setupStream() function contains the interesting code.) and client.html.

As you can see I'm trying to send the duplex stream over the connection and pipe the microphone inputs into the duplex stream and pipe the duplex stream to the speaker on each end (like I did it in tvoip). It does not work atm, though.

Edit:

I'm not sure if I get this right, but the "stream" that I get from getUserMedia() is a MediaStream and this media stream can have MediaStreamTracks (audio, video or both). I'm my case it would obviously just be one track (audio). But a MediaStreamTrack doesn't seem to be a stream as I know it from Node.js, meaning that it can't just be piped. So maybe it would have to be converted into one. I found this interesting library called microphone-stream which claims to be able to do it. But it doesn't seem to be available as a simple browser library. It seems to require wrapping your whole project with browserify. Which seems very overkill. I'd like to keep it simple.

Michael Beer Michael Beer · Accepted Answer · 2018-06-10T21:42:31

There exists a standard for doing VoIP with browsers that is supported by all mayor browsers: WebRTC. Although being a dreadful beast of complexity, it is supported out of the box by all mayor browsers which hide its complexity. I am no javascript developer, but I highly assume that there exists gold support for it in the JS world, look at e.g. this blogpost.

If you do not want the full-featured overkill solution, I would draw back to RTP as a streaming protocol , which is kind of standard in VoIP and Opus for encoding. Both are well-established technologies and form kind of the default pair of VoIP streaming, RTP is leightweight, and Opus efficient in compressing while rtaining high audio quality. They ought to be well-supported in either the Browser and node.js environments.

Beware: If you decide to send plain PCM, precisely define all the parameters - frame length (8, 16, 32 bit), a signed/unsigned, integer/float and expecially endianness !

Voice chat between Node.js and browser (audio streams, VoIP)

4 Answers