1
votes

i am writing a server program with c++. i used epoll socket multiplexing for linux and openssl 1.0.2 for secure layer

on Centos 6.9 final (with 4 cores, 2GBs of RAM, x64 arch).

This server program's purpose is a websocket/socket server library for my projects.

My problem is, SSL_accept and SSL_do_handshake functions are little slow.

i made a little statics for my problem;

for 50k EPOLL_CTL_ADD usage (i mean, 50k user connections simultaneously/repeatedly) my main read function (SSL_read, SSL_accept logic) spending 92.8 seconds at total.

for this 92.8 seconds, SSL_accept (or SSL_do_handshake [i tried]) functions spends 81.9 seconds.

but when i close my ssl algorithms (made with defines) and re-stress, read main function usage [read( function] is spending 10.4 seconds for 50k with same function usages in this statics.

The averages are;

  • No SSL : my read main function spends 0.2ms average for each read call (and perform my upper functions) from epoll loop (10.4/50000)

  • With SSL : my read main function spends 1.85ms average for each read call (and perform my upper functions) from epoll loop (92.8/50000)

With these conditions my server can accept only 18~19k users at same time (i tried a lot of times) (SSL version is 9.25 times slow from Non-SSL version). And there is a lot of locking my server itself at these times.

Notes:

  • At same VPS, i tried nodejs, nginx etc, there is no locking theirselfs like my server.
  • SSL certificate is purchased ssl from comodo.
  • i research thousands of times and i looked at a lot of server source codes thounds of times.

My question is, why my SSL_accept/SSL_do_handshake functions are slow.

i tried a lot of things in 3~5 months. But i couldn't figure it out.

My time spend logic is:

###
timeMs = CurTime()
SSL_accept(...)
timeTotalVariable += (CurTime() - timeMs)
###

This is my pseudo code :

"call" ctx = SSL_CTX_new(SSLv23_method())
; close sslv2 & sslv3 protos
; set SSL_MODE_RELEASE_BUFFERS | SSL_MODE_ENABLE_PARTIAL_WRITE | SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER | SSL_OP_NO_COMPRESSION | SSL_OP_CIPHER_SERVER_PREFERENCE | SSL_OP_NO_SESSION_RESUMPTION_ON_RENEGOTIATION
; set SSL_SESS_CACHE_OFF (if i use built-in openssl server cache, there is nothing big different)
; SSL_CTX_set_session_id_context ...
; SSL_CTX_set_verify(cx, SSL_VERIFY_NONE, NULL);
; SSL_CTX_set_options(ctx, SSL_OP_SINGLE_DH_USE);
; SSL_CTX_set_tmp_dh(ctx, ...
; SSL_CTX_use_certificate_chain_file(...
; SSL_CTX_use_PrivateKey_file(...
; SSL_CTX_check_private_key(...
; SSL_CTX_set_cipher_list(...
; SSL_CTX_set_options(ctx, SSL_OP_SINGLE_ECDH_USE);
; SSL_CTX_set_ecdh_auto(ctx, 1); (or set curve with "prime256v1")
"call" bind(srvSock, ...
"call" srvSock = listen(...

"call" add_epoll(srvSock, ReadFlag

###

"func" Read_Handle(Sock // from my epoll( loops
  if (Sock == srvSock)
    "call" Accept_Handle(Sock
  else
    "call" Real_Read_Handle(Sock
"end func"

"func" Real_Read_Handle(Sock
  if (!SSL_is_init_finished(ssl)) {
    timeMs = CurTime()
    "call" ret = SSL_accept(ssl);
    timeTotalVariable += (CurTime() - timeMs)
    "call" err = SSL_get_error(ssl, ret);
    if (err == SSL_ERROR_WANT_READ) {
      set_epoll(Sock, ReadFlag
    return;
    }
    if (err == SSL_ERROR_WANT_WRITE) {
      set_epoll(Sock, WriteFlag
    return;
    }
    ...
  }
  "call" SSL_read(ssl, ...
  ...
"end func"

"func" Accept_Handle(Sock
  "call" newSock = accept(Sock, ...
  "call" MyNonBlockingSocketSetFunction(Sock
  "call" ssl = SSL_new(ctx
  "call" SSL_set_fd(ssl, newSock);
  "call" SSL_set_accept_state(ssl
  "call" BIO_set_nbio(SSL_get_rbio(ssl), 0);
  "call" BIO_set_nbio(SSL_get_wbio(ssl), 0);
  "call" add_epoll(newSock, ReadFlag // handshake will do at main read function.
"end func"

my codes are very huge (i am coding this server for a long time), so i gave pseudo codes. as i said, i tried a lot of things, but not succeed at all.

there is a problem in strace (i think);

read(9898, "\26\3\3\0F", 5)             = 5
read(9898, "\20\0\0BA\4@/\241|9\325\247\351S\265\7<\204\5\260\203H\\\314\212\301\324B\f\353+"..., 70) = 70
read(9898, "\24\3\3\0\1", 5)            = 5
read(9898, "\1", 1)                     = 1
read(9898, "\26\3\3\0(", 5)             = 5
read(9898, ">\205\203\0006\354\337\264{\214\6\0\4\343\311Ai%\30\347\20\307\300\253+\200}B\\\326:\211"..., 40) = 40
write(9898, "\26\3\3\0\312\4\0\0\306\0\0\1,\0\300\17\225!A4\310\312(\231.\263\23\t\265\3}\211"..., 258) = 258
read(9896, "\27\3\3\1\347", 5)          = 5
read(9896, "t()R\2\347pvE\341\n\272C\231\267\352.\21G\212u\203wO\v\16\326\4\352\205\326\340"..., 487) = 487
read(9895, "\26\3\3\0F", 5)             = 5
read(9895, "\20\0\0BA\4\213\335\242OV\2154\371\345\2321\257\217\377\236\"xN\206\322\205I{\242\307\276"..., 70) = 70
read(9895, "\24\3\3\0\1", 5)            = 5
read(9895, "\1", 1)                     = 1
read(9895, "\26\3\3\0(", 5)             = 5
read(9895, "^\341Ii\273\35\nMe\214\303\352aE\236\26q\333\274\366\375\255@;\275Ad\204ko\223\377"..., 40) = 40
write(9895, "\26\3\3\0\312\4\0\0\306\0\0\1,\0\300\17\225!A4\310\312(\231.\263\23\t\265\3}d"..., 258) = 258
read(9894, "\27\3\3\1\346", 5)          = 5
read(9894, "\n+e\16\\\330\305\322\364\367j\356b\336\3T3\va\200\324\03107_\320,\22H\33\3\350"..., 486) = 486
read(9892, "\26\3\3\0F", 5)             = 5
read(9892, "\20\0\0BA\4Q\214t7$\276$\3744\247\364,\320\376\225\2623z\204\254U\355\17\323\214S"..., 70) = 70
read(9892, "\24\3\3\0\1", 5)            = 5
read(9892, "\1", 1)                     = 1
read(9892, "\26\3\3\0(", 5)             = 5
read(9892, "\201\214T8\257nBM{\210\202V\25\340R\315)\320\343'h\341\353\351\6f!$\314\230\300\221"..., 40) = 40
write(9892, "\26\3\3\0\312\4\0\0\306\0\0\1,\0\300\17\225!A4\310\312(\231.\263\23\t\265\3}h"..., 258) = 258

These are handshake read/write calls at SSL_accept of my server.

but i tested nginx, openlitespeed, nodejs etc... Those servers' strace log like this;

read(8247, "\26\3\1\1.\1\0\1*\3\3T\214\201\305\2\365\260\r\210\212\232j\1778\315\301\312\235\354e\4"..., 1024) = 307
write(8247, "\26\3\3\0=\2\0\0009\3\3\16\241r76\203\245e\373\10rF\330l\235-\3\2436Y\326"..., 1821) = 1821
read(8178, "\26\3\1\1.\1\0\1*\3\3\367._\207\353\316\262j\277\332\352\237\363\343\30\351\370X\210\34x"..., 1024) = 307
write(8178, "\26\3\3\0=\2\0\0009\3\3\347\322\200\222\206\336y\2\243\213<\235\265\230\33\315\375\306H5\7"..., 1821) = 1821
read(8280, "\26\3\1\1.\1\0\1*\3\3\35\214\374)Vg7\225\36\340\251*\234j\35>O\234\3Dw"..., 1024) = 307
write(8280, "\26\3\3\0=\2\0\0009\3\3\240q\207T\331\262\314\261o\3\307L{U\20c\270\232\377(\364"..., 1821) = 1821
read(8567, "\26\3\1\1.\1\0\1*\3\3\255\270\3447\362\226\17\276x\205\305\334C\16$@Zd\2\353\304"..., 1024) = 307
write(8567, "\26\3\3\0=\2\0\0009\3\3\355\314fko\25C\235\260\213\273\263o`\2\234:\344\27\341]"..., 1821) = 1821

for handshake, 8 partial read at my server ???

why my SSL_accept function is slow?

PS. : Sorry for my bad grammer.

Edit; SSL settings parts (ssl settings parts are easy to reproduce, but socket multiplexing part is very difficult to reproduce)

SSL_CTX *ctx = NULL;
void Init_SSL() {
    OPENSSL_config(NULL);
    SSL_library_init();
    SSL_load_error_strings();
    OpenSSL_add_all_algorithms();
}

void Init_Server() {
    #if OPENSSL_VERSION_NUMBER < 0x10100000L
        ctx = SSL_CTX_new(SSLv23_method());
    #else
        ctx = SSL_CTX_new(TLS_method());
    #endif
    if (ctx == NULL) {
        cout << "can not create ctx" << endl;
    return;
    }
    SSL_CTX_set_options(ctx, SSL_OP_NO_SSLv2);
    SSL_CTX_set_options(ctx, SSL_OP_NO_SSLv3);
    SSL_CTX_set_mode(ctx, SSL_MODE_RELEASE_BUFFERS);
    SSL_CTX_set_mode(ctx, SSL_MODE_ENABLE_PARTIAL_WRITE);
    SSL_CTX_set_mode(ctx, SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER);
    SSL_CTX_set_options(ctx, SSL_OP_NO_COMPRESSION);
    SSL_CTX_set_options(ctx, SSL_OP_CIPHER_SERVER_PREFERENCE);
    #ifdef SSL_OP_NO_SESSION_RESUMPTION_ON_RENEGOTIATION
    SSL_CTX_set_options(ctx, SSL_OP_NO_SESSION_RESUMPTION_ON_RENEGOTIATION);
    #endif
    SSL_CTX_set_session_cache_mode(ctx, SSL_SESS_CACHE_OFF);
    const unsigned char *TmpStr = "SslSrv";
    SSL_CTX_set_session_id_context(ctx, TmpStr, strlen(TmpStr));
    SSL_CTX_set_verify(ctx, SSL_VERIFY_NONE, NULL);
}

bool LoadCert() {
    DH *dh;
    BIO *bio = BIO_new_file("server.dh", "r");
        if (bio == NULL) {
            cout << "dh file error" << endl;
        return false;
        }
        dh = PEM_read_bio_DHparams(bio, NULL, NULL, NULL);
        BIO_free(bio);
        if (dh == NULL) {
            cout << "dh params error" << endl;
        return false;
        }
        const int size = BN_num_bits(dh->p);
        if (size < 1024) {
            cout << "dh bits size error" << endl;
        return false;
        }
        SSL_CTX_set_options(ctx, SSL_OP_SINGLE_DH_USE);
        int r = SSL_CTX_set_tmp_dh(ctx, dh);
        if (!r) {
            cout << "cannot set tmp dh" << endl;
        return false;
        }
    }
    if (SSL_CTX_use_certificate_chain_file(ctx, "server.crt") <= 0) {
        cout << "cannot load cert" << endl;
    return false;
    }
    if (SSL_CTX_use_PrivateKey_file(ctx, "server.key", SSL_FILETYPE_PEM) <= 0) {
        cout << "cannot load priv key file" << endl;
    return false;
    }
    if (!SSL_CTX_check_private_key(ctx)) {
        cout << "priv key file is wrong" << endl;
    return false;
    }
    // this list from nodejs (default list)
    if (SSL_CTX_set_cipher_list(ctx, "ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384:DHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA256:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!SRP:!CAMELLIA") == 0) {
        cout << "cannot set cipher list" << endl;
    return false;
    }
    SSL_CTX_set_options(ctx, SSL_OP_SINGLE_ECDH_USE);
    #if SSL_CTRL_SET_ECDH_AUTO
    SSL_CTX_set_ecdh_auto(ctx, 1);
    #endif
return true;
}

is there any missing or invalid things in this settings? Thanks.

1
ssl parts is easy, but socket (multiplexing) parts is very difficult to produce simple codes.M. Mike
Just in case, did you check if there are enough random bits available? cat /proc/sys/kernel/random/entropy_avail, should return something around 1k~2k.LMC
# cat /proc/sys/kernel/random/entropy_avail 140 HmmM. Mike

1 Answers

1
votes

SSL Handshake includes critical operational logic includes encryption. I am listing you some of the reasons for slow processing:-

  1. A client requests access to a protected resource.
  2. The server presents its certificate to the client.
  3. The client verifies the server’s certificate.
  4. If successful, the client sends its certificate to the server.
  5. The server verifies the client’s credentials.
  6. If successful, the server grants access to the protected resource requested by the client

So if you don't have SSL handshake in your application, your application avoids all the above operation and obviously it will increasing operational speed.

One most important reason for slow speed is if you consider all the above there is so many communication between client and server and data decryption/encryption is done in a synchronize and that also reduces the parallel processing as well.