6
votes

I have a requirement to create a UDP file transfer system. I know TCP is guaranteed and much more reliable, but I need to transfer huge files between locations and I think the speed advantage in this project outweighs the benefits using TCP. I’m just starting this project, but would like some guidance if anyone has done this before. I will be writing both sides (client and server) so I don’t need to worry about feature limitations in other products.

In a nutshell I need to:

  • Take large files and send them in chunks
  • Be able to throttle bandwidth from the client
  • Create some kind of packet numbering system for errors, retransmitions and assembling files by chunk on server (yes, all the stuff we get from TCP for free :-)
  • Configurable datagram size – I think some firewalls complain if they get too big?
  • Anything else I may be missing

I’m starting this journey using UdpClient and would like to write this app in C#. Any words of wisdom (other than to use TCP)?


It’s been done with huge success. We used to use RocketStream.com, but they sold their product to another company for internal use only. We typically get speeds that are 30X faster than FTP or raw TCP byte transfers.

5
Use TCP :) "I think the speed advantage in this project outweighs the benefits using TCP." What? Do you really expect to get any speed advantage over TCP (why?).ysdx
UDP usually performs better than TCP in case of short data transfers, not long ones.Serge Wautier
At a guess, the speed advantages of UDP come from precisely the fact it doesn't natively implement the things you say you're going to implement anyway.millimoose
Or in other words, the larger the file, the more you care of a reliable transport. Shop for a TFTP library.Hans Passant
We're actually seeing these speeds now with rocketstream.com, so it's not just marketing. The idea is to not ACK every packed. Start at an arbitrary number of packets, then check to make sure they got there. If they did, transmit the next set of packets. If the didn't or there were any errors, retransmit and lower the number of packets between ACKs. The idea being big performance gains on better networks, less gains on crappy ones.Scott

5 Answers

2
votes

in regards to

Configurable datagram size – I think some firewalls complain if they get too big?

one datagram could be up to 65,536 bytes. cosidering all the ip header information you'll end up with 65,507 bytes for payload. but you have to consider how all devices are configured along you network path. typically most devices have set an MTU-size of 1500 bytes so this will be typically your limit "on the internet". if you set up a dedicated network between your locations you can increase your MTU an all devices.

further in regards to

Create some kind of packet numbering system for errors, retransmitions and assembling files by chunk on server (yes, all the stuff we get from TCP for free :-)

i think the best thing in your case would be to implement a application level protocol. like

32 byte sequence number 8 byte crc32 checksum (correct me on the bytesize) any bytes left can be used for data

hope this gives you some bit of a direction

::edit::

from experience i can tell you UDP is about 10-15% faster than TCP on dedicated and UDP-tuned networks.

1
votes

I'm not convinced the speed gain will be tremendous, but an interesting experiment. Such a protocol will look and behave more like one of the traditional modem based protocols, and probably ZModem is one of the better examples to get some inspiration from (implements an ack window, adaptive block size, etc).

There are already some people who tried this, check out this site.

1
votes

That would be cool if you succeed.

Don't go in it without WireShark. You'll need it.

For the algorithm, I guess that you have pretty much the idea of how to start. Maybe some pointers:

  1. start with MTU that will be common to both endpoints, and use packets of only that size, so you'll have control over packet fragmentation (when you come down from TCP, I hope that this is for the more control over low level stuff).
  2. you'll probably want to look into STUN or TURN for punching the holes into NATs.
  3. look into ZModem - that also has a nostalgic value :)
  4. since you want to squeeze maximum from you link, try to put as much as you can in the 'control packets' so you don't waste a single byte.
  5. I wouldn't use any CRC on packet level, because I guess that networks underneath are handling that stuff.
1
votes

I just had an idea...

  1. break up a file in 16k chunks (length is arbitrary)
  2. create HASH of each chunk
  3. transmit all hashes of the chunks, using any protocol
  4. at receiving end, prepare by hashing everything you have on your hard drive, network, I mean everything, in 16k chunks
  5. compare received hashes to your local hashes and reconstruct the data you have
  6. download the rest using any protocol

I know that I'm 6 months behind the schedule, but I just couldn't resist.

0
votes

Others have said more interesting things, but I would like to point out that you need to make sure you use a good compression algorithm. That will make a world of difference.

Also I would recommend validating your assumptions as to the speed improvement possibility, make a trivial system of sending data (not worrying about loss, corruption, or other problems) and see what bandwidth you get. This will at least give you a realistic upper bound for what can be done.

Finally consider why you are taking on this task? Will the speed gains be worth it after the amount of time spent developing it?