UDP Load Testing: How do you simulate many UDP clients?

Question

I am developing a tool to perform load testing on a UDP server (using C#/.NET 4.0 running on NT 6.x, although that is less relevant). The server talks to tens of thousands of clients, where the communication between each client is very low traffic and infrequent. All communications follow a request-reply pattern, where one of the sides initiates communication with the other side who then replies. When the server needs to send something to the client, he looks up the last known endpoint (IP + port) of the client and sends a UDP packet, and listens for a reply on a single known port which is used to receive communications from all clients. When the client initiates communication, it already knows the endpoint of the server, and simply sends a packet from an ephemeral port and waits for a reply on the same port. The same ephemeral port is used for the lifetime of the client.

The design of the load testing tool is pretty simple; emulate the behavior, state and decision making of each client, to a low yet sufficient complexity. Since the communication with each client is only occasional (every few seconds), and the amount of processing required for each communication is very minimal, the best approach I can think of is to use a single thread with a single socket to perform all the communication for a large number of simulated clients, which most likely still won't keep the thread fully busy and socket saturated. Unfortunately, I have encountered two problems with that approach arising from the fact that each client sends and receives from his own port:

A socket will only allow sending a UDP packet from either a system-allocated ephemeral port or a specific port the socket is bound to.
A socket will only receive UDP packets from the port it is bound to.

One socket per client

This above two constraints seems to mean that I must create a socket for each client, since the UDP packet must originate from a certain port and a reply will be sent to that port. So the first possible solution is to do just that, create a socket per simulated client. Let's say we're simulating 30,000 clients on a single machine:

Is creating 30,000 sockets even feasible? Is it a best practice? Is it performant? Will Windows even let you bind 30,000 sockets to 30,000 different ports?
How do I check across 30,000 client sockets if any data was sent by the server? Do I poll all the sockets periodically to see if any data was received? Is there a way to wait on all 30,000 sockets and get the first packet that arrives to any of them?
What resources does the operating system allocated for each socket? What are the limits of each of those resources and what are the implications of reaching them?

A single socket for all clients

A different approach would be to use a single socket, but the two problems I mentioned earlier must first be solved somehow:

The first problem of having the UDP packet originate at a port other than the Socket's port is solvable. It involves creating a raw socket and constructing the UDP header yourself, which means you can specify any source and destination port you'd like. The only difficulty is with calculating the optional yet important UDP checksum, which requires not only the UDP header and payload, but also the source and destination IP address, the former is problematic since it requires calling Win32 APIs to obtain (GetBestInterface and GetAdaptersInfo) which involves several native structures and lots of unmanaged memory allocations, a potential reliability pitfall from a .NET perspective, but it can be done.
The second problem, using a single socket to receive UDP packets from a list (or range) of ports remains unsolved by me. Even with raw socket, the OS demands I bind the socket to a specific single port before allowing me to perform receive operations. Is there a way to do that? There's always the packet sniffing techniques, but I'd rather avoid them unless it can be done from managed code in a reliable and a somewhat straightforward way (in which case, I'm open for suggestions).

Other approaches?

Is there another approach I haven't thought of? Am I on the wrong track? Can you think of a better solution? I'd love to hear any suggestions you might have.

Len Holgate Len Holgate · Accepted Answer · 2011-02-15T09:57:06

Yes you can easily create > 30,000 sockets on a windows machine, but you may need to tune MAXUSERPORT (see here).

Use I/O completion ports, or async I/O and you don't then need to worry about 'polling' and the scalability 'just works'.

The main resource issue is non-paged pool but this has become far less of an issue on Vista or later (see here), the next issue is the I/O locked pages limit which may be an issue if you are posting very large buffers for your reads, but given this is UDP I assume you'll have 'sensible' sized datagrams and the locked pages limit is then unlikely to be an issue.

I've blogged about some of the scalability issues here: http://www.serverframework.com/asynchronousevents/2010/12/one-million-tcp-connections.html

I've written tools similar to what you're trying to do using my C++ socket server framework. There's a UDP test tool example that ships as part of the framework that sends a configurable number of datagrams from unique client ports and waits for replies and which is easy to adjust to deal with your specific datagram format and response requirements (see here).

UDP Load Testing: How do you simulate many UDP clients?

One socket per client

A single socket for all clients

Other approaches?

1 Answers