I have done voice chatting between two node.js servers before (see: tvoip), which works quite well, but now I would like to do it between a node.js server and a browser. How could this be done?
From node.js to node.js I simply used raw PCM streams over a TCP connection.
For the browser this is probably not going to be that easy, right? I mean the browser doesn't really offer a TCP API. It does offer a WebSocket API, but does it handle streams? Would I have to convert the streams and if so into what format and how? What protocol should I use? Are there any helpful libraries to accomplish this already? Is socket.io-stream a viable library to send these kinds of streams?
From what I understand the audio streams are in the PCM format on the browser. So it should be compatble with the streams I got in Node.js. Is that assumption correct?
I have managed to pipe the browser mic input to the browser speaker output like this:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
</head>
<body>
<!-- alternative method that also works
<audio></audio>
<script>
navigator.mediaDevices.getUserMedia({ audio: true }).then(function(stream) {
const audio = document.querySelector('audio')
audio.srcObject = stream
audio.onloadedmetadata = function(e) {
audio.play()
}
}).catch(console.error)
</script>
-->
<script>
navigator.mediaDevices.getUserMedia({audio: true}).then(stream => {
const aCtx = new AudioContext()
const analyser = aCtx.createAnalyser()
const microphone = aCtx.createMediaStreamSource(stream)
microphone.connect(analyser)
analyser.connect(aCtx.destination)
}).catch(err => {
console.error("Error getting audio stream from getUserMedia")
})
</script>
</body>
</html>
As you can see I found two solutions. I will try to base the node<->browser voice chat on the second one.
For Node.js I came up with this code to pipe a node.js mic input to a node.js speaker output:
const mic = require('mic')
const Speaker = require('speaker')
const micInstance = mic({ // arecord -D hw:0,0 -f S16_LE -r 44100 -c 2
device: 'hw:2,0', // -D hw:0,0
encoding: 'signed-integer', // -f S
bitwidth: '16', // 16
endian: 'little', // _LE
rate: '44100', // -r 44100
channels: '1', // -c 2
debug: true
})
const micInputStream = micInstance.getAudioStream()
const speakerInstance = new Speaker({ // | aplay -D plughw:CARD=0,DEV=0
channels: 1,
bitDepth: 16,
sampleRate: 44100,
signed: true,
device: 'plughw:2,0' //'plughw:NVidia,7'
})
speakerInstance.on('open', ()=>{
console.log("Speaker received stuff")
})
// Pipe the readable microphone stream to the writable speaker stream:
micInputStream.pipe(speakerInstance)
micInputStream.on('data', data => {
//console.log("Recieved Input Stream: " + data.length)
})
micInputStream.on('error', err => {
cosole.log("Error in Input Stream: " + err)
})
micInstance.start()
console.log('Started')
Finding the right device
for mic and speaker can be a bit tricky if you are not familiar with ALSA under Linux. It is explained here in case you are unsure. I am not certain how it works on Windows and Mac OS with SoX.
I then came up with a small test application to connect the two ideas using socket.io-stream (a socket.io library that allows sending streams over a socket). And obviously, this is where I'm stuck at.
Basically, I try this on the node.js side:
const mic = require('mic')
const Speaker = require('speaker')
const SocketIO = require('socket.io')
const ss = require('socket.io-stream')
...
io.on('connection', socket => {
let micInstance = mic(micConfig)
let micInputStream = micInstance.getAudioStream()
let speakerInstance = new Speaker(speakerConfig)
...
ss(socket).on('client-connect', (stream, data) => { // stream: duplex stream
stream.pipe(speakerInstance) //speakerInstance: writable stream
micInputStream.pipe(stream) //micInputStream: readable stream
micInstance.start()
})
})
and this on the browser side:
const socket = io()
navigator.mediaDevices.getUserMedia({audio:true}).then(clientMicStream => { // Get microphone input
// Create a duplex stream using the socket.io-stream library's ss.createStream() method and emit it it to the server
const stream = ss.createStream() //stream: duplex stream
ss(socket).emit('client-connect', stream)
// Send microphone input to the server by piping it into the stream
clientMicStream.pipe(stream) //clientMicStream: readable stream
// Play audio received from the server through the stream
const aCtx = new AudioContext()
const analyser = aCtx.createAnalyser()
const microphone = aCtx.createMediaStreamSource(stream)
microphone.connect(analyser)
analyser.connect(aCtx.destination)
}).catch(e => {
console.error('Error capturing audio.')
alert('Error capturing audio.')
})
The whole code can be viewed at: https://github.com/T-vK/node-browser-audio-stream-test
(The README.md contains instructions on how to set it up, if you want to test it.) The relevant code is in server.js (The setupStream() function contains the interesting code.) and client.html.
As you can see I'm trying to send the duplex stream over the connection and pipe the microphone inputs into the duplex stream and pipe the duplex stream to the speaker on each end (like I did it in tvoip). It does not work atm, though.
Edit:
I'm not sure if I get this right, but the "stream" that I get from getUserMedia() is a MediaStream and this media stream can have MediaStreamTracks (audio, video or both). I'm my case it would obviously just be one track (audio). But a MediaStreamTrack
doesn't seem to be a stream as I know it from Node.js, meaning that it can't just be piped. So maybe it would have to be converted into one. I found this interesting library called microphone-stream which claims to be able to do it. But it doesn't seem to be available as a simple browser library. It seems to require wrapping your whole project with browserify. Which seems very overkill. I'd like to keep it simple.