6
votes

I need an EXPERT opinion please, and sorry if my question itself is a confused question.

I was reading around about structure of VOIP applications (Client/Server). And mostly UDP is recommended for voice streams. I also checked some voicechat applications like paltalk and inspeak and their sites mention they use udp voice stream which dont seem correct for below reasons.

I examined the traffic/ports used by paltalk and inspeak. They have UDP and TCP ports open and using a packet sniffer i can see there is not much UDP communication but mostly it is the TCP communication going on.

Also as far as i know, In UDP Protocol server can not send data to a client behind NAT (DSL Router). And "UDP Braodcast" is not an option for "internet" based voice chat applications. THATS WHY YAHOO HAVE MENTIONED in their documentation that yahoo messenger switch to tcp if udp communication is not possible.

So my question is ....

  1. Am i understanding something wrong in my above statements ?

  2. If UDP is not possible then those chat applications use TCP Stream for voice ?

  3. Since i have experienced that TCP voice streams create delay, No voice breaking but Delay in voice, so what should be the best structure for a voice chat server/client communication ?

So far i think that , if Client send data as udp packets to server and server distribute the packets to clients over TCP streams, is this a proper solution ? I mean is this what commercial voicechat applications do ?

Thanks your answer will help me and a lot of other programmers .

JF

2

2 Answers

3
votes

UDP has less overhead (in terms of total packet size), so you can squeeze more audio into the channel's bandwidth.

UDP is also unreliable - packets sent may never be received or could be received out of order - which is actually OK for voice applications, since you can tolerate some loss of signal quality and keep going. a small amount of lost packets can be tolerated (as opposed to downloading a file, where every byte counts).

can you use TCP? sure, why not... it's slightly more overhead, but that may not matter.

SIP is a voice/media standard that supports UDP and TCP. most deployments use UDP because of the lower overhead.

The Skype protocol prefers UDP where possible, and falls back to TCP.

in SIP situations, the NAT problem is solved by using a nat keep-alive packet (any request/response data) to keep the channel up and open, and by exploiting the fact that most NATs will accept replies on the same source port the connection was opened from... this isn't foolproof, and often requires a proxy server mediating the connection between 2 nat'd peers, but it's used in many deployments.

STUN, TURN, and ICE are additional methods that help with NAT scenarios, and especially in p2p (serverless) situations.

info regarding NAT issues and media:

http://www.voip-info.org/wiki/view/NAT+and+VOIP

http://en.wikipedia.org/wiki/UDP_hole_punching

http://www.h-online.com/security/features/How-Skype-Co-get-round-firewalls-747197.html

if you're implementing a voice service of some kind, a system like FreeSwitch provides most of the tools you need to deliver media to distributed clients:

http://www.freeswitch.org/

1
votes

I see the question is 3 years overdue, but I see no answer accepted, so I'll take a shot at it

1- your statements are correct

2- correct, TCP or UDP can be used for audio stream.

3- Combining tcp and udp for the audio stream is not useful. If UDP is working for transmission to the server, it will work for reception, that's how all NAT firewalls work, i.e they send datagram received from internal host to remote host after they change the ip header to make the packet seem coming from them, and when they receive response, they forward it back to internal host. The difference between NAT firewalls is for how long the NAT tunnel will remain alive, but this does not matter for the audio part of the call, as there is constant flow of audio in both way during a call. This would matter more for the signalling part of the call, which uses the SIP protocol. So I would recommend using TCP for SIP as the TCP session has a default timeout of 900s, making the keep alive messages less frequently needed.

Now some applications you mentioned do not use SIP for session initiation, and hence have proprietary ways of signalling.

Other applications take advantage of something called 'hole punching' to allow client-to-client direct communication (or peer-to-peer) such as Skype. The advantage of these is that the server does not stay in the middle of the voice stream, and this can effectively reduce latency, making TCP a potential choice for the audio stream.

The guys behind development of Asterisk, the famous opensource PBX, have realized the problems in SIP which require having lots of ports open, and they have developed their own protocol, called IAX, to transmit signalling and media over one port. I would encourage you to consider implementing IAX for your client/server, because it ensures that if a client is able to connect (through signalling), then it's able to make calls.