5
votes

I was in a meeting on Google Meet and saw that you could turn on real time subtitles. They've actually got a demo here on how realtime speech to text can be done, so that bit doesn't confuse me.

I had also been wanting to experiment with WebRTC (which I believe GoogleMeet uses) just to see its capabilities - e.g. the ability to share a screen without any additional screens.

However, I've always been under the impression that a WebRTC video/audio stream is client peer-to-peer. The questions I have therefore are

  • How then are Google able to send the audio stream off to a server for analysis?
  • Is it possible to send the audio stream to the client as well as to a server?
  • Would you have to create two of the same audio stream (i don't know if this is even possible), send one over WebRTC to the other peer(s) and the other to a server for analysis?

How do they achieve this - and if they don't use WebRTC, is it possible to achieve this with WebRTC?

1

1 Answers

5
votes

Google Meet is using WebRTC. The "peer" in that case is a server, not a browser. While six years old and some details have changed, much of this old article is still true. From the server Google can do audio processing.

This video describes the architecture required for speech-to-text (and actually translation + text-to-speech again).