Problem:
I'm seeing intermittent issues which causes SSL connections to be dropped. I am able to reproduce the issue in my test environment (1 out of 50k calls fails). The software I'm testing is a SMTP/POP3/IMAP-server.
The SSL connection is dropped after the handshake has been performed successfully, after a few commands have been sent back and forth between the SMTP client and the server.
My server software (running on Windows) is written using:
- Boost: 1.56.0
- OpenSSL: 1.0.1j
The client application is written using .NET3.5, SslStream
The application is quite similar to HTTP Server 3 in the Boost examples
The call I'm making looks like this:
boost::asio::async_read_until(ssl_socket_, receive_buffer_, delimitor, strand_.wrap(AsyncReadCompletedFunction));
In AsyncReadCompletedFunction, the error contains the following:
- error_code::value = 67567722
- error_code::message = block type is not 01
Sometimes the error is instead:
- error_code::value = 67530866
- error_code::message = padding check failed
Or:
- error_code::value = 336265225
- error_code::message = PEM lib
These errors are set by OpenSSL.
Isolation
- When reducing the number of threads which calls io_service::run() from 15 to 1, the problem appears to go away.
- I don't see the issue for non-SSL connections.
I've made sure that all calls to Asio goes through a strand. Both the call to async_write/async_read and the completion handlers are executed in a strand, as follows.
std::shared_ptr<TCPConnection> shared_this = shared_from_this(); strand_.post([AsyncWriteCompletedFunction, send_buffer_, shared_this] { boost::asio::async_write (shared_this->ssl_socket_, send_buffer_, shared_this->strand_.wrap(AsyncWriteCompletedFunction)); });
I've tested multiple variants, such as just wrapping the handler in a strand and just the call to boost::asio::async_write in a strand.
- For testing, I introduced a recursive_mutex which I place a scoped_lock on prior to doing anything. All the methods in the connection class starts by taking this lock so no two methods are executing at the same time.
I've introduced synchronous logging to file in my methods, and have confirmed that there's no overlapping read/write calls being made:
2014-12-01 13:12:45.362: Session 2254, Starting AsyncWrite - Call to boost::asio::async_write 2014-12-01 13:12:45.362: Session 2254, AsyncWriteCompleted - Write completion handler called 2014-12-01 13:12:45.362: Session 2254, Starting AsyncRead - Call to boost::asio::async_read 2014-12-01 13:12:45.362: Session 2254, AsyncReadCompleted - Read completion handler called, with error: The read operation failed. Bytes transferred: 0 Remote IP: 127.0.0.1, Session: 2254, Code: 67567722, Message: block type is not 01
I've considered that this could be an error in the client I use to run the tests. But it's single-threaded and implemented using a loop with synchronous calls. I've double-checked the documentation for the SslStream class in .NET Framework 3.5, but can't see anything wrong there either (the client code is very simple).
The test runs within VMware on a Windows XP Service Pack 3. I'm running test on other OS:es as well, but so far I'm not seeing the there.
EDIT 1:
I've installed Wireshark to trace the SMTP-traffic. I see the following:
- Handshake performed succesfully.
- Server sends welcome message ("220 HI ESMTP")
- Client sends command EHLO test
- Server responds with its supported commands
- Client sends command AUTH LOGIN
- Server requests user name
- Client sends user name. This is the second line from the end.
- Server disconnects client. (RST).
When looking at the server logs, I can see that the client user name sent in step 7 was never received by the server. The async_read operation completed, and the completion handler was called with error code 336265225: PEM lib"
I've run out of ideas to test. Anyone has suggestions on what should be tried?