1
votes

we are using Netty3.2.1 in our gaming company for connection management between client and server . Our aim is improve

  1. Able handle the blip scenarios like network outages and must able to recover from this asap.
  2. Increase the number of connections that can be handled by a single server with out dropping the existing connected sockets .

Blip Scenario In real time environment there is chance of blips due to ISP failure , so all the existing client(Our clients fire a reconnection after a existing connections breaks) connections will break and reconnect at the same time . For example if a server is connected to 25000 clients , all 25000 clients tries to connect back to server at same time during blip, which puts huge pressure on the server . As Netty older versions does not provide the feature of enable/disable the OP_ACCEPT we started throttling the connections beyond limit of 100 concurrent connections ssl handshakes. We pause the boss thread for sec when the limit is crossed .we are not yet ready to use Netty 4.0 Beta version which provides the feature.

Other scenario we observed during our stress testing was if there is huge number of incoming connections on the server , it does take lot of time to accept all the incoming connections and stabilize(Around 30 Mins for 25 K connections , all 25 K trying to connect at same time) . But if we ramp up the load slowly netty is able to accept all the connections easily in a very less time (around 3 to 5 mins). We want to build the system such that we want to be resilient to handle blip scenarios .

  • I added some logs in the existing Netty Jar and it seems to fail in the read() method when the clients disconnects during huge load.
  • Using Netty3.6 Final jar , did not help much
  • Will using a Execution Handler Help ?
  • Disabling the throttling resulted in OOM exception on server.

Please suggest any possible approaches to address this issue . Are there any other ways to throttle the connections ?

set up details and configuration

Clients - JMeter framework - 2 GB Machines Load - 50000 clients no of messages per sec - 12000 messages/sec . Message being client hello and sever hello as strings. ulimit on all machines is 1,00,000

Server - 12 Core, HT and 48 GB ram worker threads - 50 ( two times the no of cores with HT it no of cores become 24) All TCP internal buffers of linux machines are configured to work with peak limits.

Our Pipeline handler have the following handlers

  • pipeline.addLast(sslHadler) - Netty in built handler
  • pipeline.addLast(messageEncoder) - just encodes and decodes the bytes and converts them as required to message frames.
  • pipeline.addLast(messageDecoder)
  • pipeline.addLast(Businesshandler)
1

1 Answers

0
votes

I think there is not much you can do with netty 3.x here beside maybe set the backlog to a small number so the client will need to reconnect later.

So I think your best bet is to upgrade to Netty 4 once possible. Sorry....