3
votes

We have an issue with SSL and I am 99% this is not your usual certificates trust store merry-go-round.

We have a Weblogic server trying to make SSL connections to Active Directory via LDAPS, underlying SSL implementation is the JSSE.

Some of the time, it works. Usually for a few hours after restarting Weblogic.

After which we start getting SSL Handshake errors, with SSL debug turned on we see:

[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)', handling exception: java.net.SocketException: Connection reset [ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)', SEND TLSv1 ALERT: fatal, description = unexpected_message [ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)', WRITE: TLSv1 Alert, length = 32 [ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)', Exception sending alert: java.net.SocketException: Broken pipe

So far I have tried the following to understand/replicate it:

  • Connecting via OpenSSL with the certs loaded - works OK every time
  • Connecting via secure ldapsearch with the certs loaded - works OK every time
  • Connecting via a custom test Java client - works OK every time
  • Decrypting the SSL handshake with Wireshark and the private key.

What I noticed with Wireshark for the "bad" hand shake, is that after the client sends a Change Cipher Spec, Finished message AD does not reply in kind. More so, Wireshark cannot decrypt the SSL handshake, failing with:

ssl_decrypt_pre_master_secret wrong pre_master_secret length (109, expected 48) dissect_ssl3_handshake can't decrypt pre master secret

Note Wireshark SSL decryption works perfectly when the SSL handshake works perfectly.

I can't see any significant differences in the good and bad SSL handshakes, until the point where the AD server does not respond.

At this point I'm stumped... I'm really struggling to understand why this would fail for some of the time and work the rest, at this point I am really just hoping for some suggestions as to what might be going on.

Oh yes, almost forgot. There is an error in the Active Directory Event log:

Event ID: 36888 The following fatal alert was raised: 20. The state of the internal error is 960.

Which, after a bit of research I managed to discover corresponds to an SSL "BAD_RECORD_MAC" error.

The only theory I have at this point, is that for some reason the wrong public key is being used to encrypt the handshake... I can't see otherwise why the server (and Wireshark) would fail to decrypt the finished message.

Thanks!

Updates:

I've compared the bad and good cases, the cipher spec in both cases is the same: TLS_RSA_WITH_AES_128_CBC_SHA. I have also compared the packets from both the client and server side, barring the normal Ethernet and IP protocol differences they are all seemingly identical.

2
This is hard to diagnose from the data provided. I'd suggest getting some logging from the cert subsystem and a network sniff in parallel (ie wireshark or similar). That'd help...Eric Fleischman

2 Answers

5
votes

So after a great deal of research, experimentation and soul searching. We eventually tracked this issue down to a third party library we were using to connect to an external system. Which upon initialization would add itself as a security provider ahead of the JSSE default provider. I don't know exactly why this then went on to break all subsequent SSL connections... but it did.

Thanks for your help.

1
votes

From what I understand, you have the problems intermittent. That is you can connect to AD via SSL but occasionally you notice this error. So I guess certificate issues Error LDAPS are not your problem.

From your description I can only think of the following:
First of all you are not giving much actual details here but your comment:

Note Wireshark SSL decryption works perfectly when the SSL handshake works perfectly.

Gives me a hint that the ciphersuites is different in the bad case. Note that wireshark is not able to decrypt a connection over a key generated by DHE despite if you have the private key.
So you should look in your investigation if the ciphersuites are indeed different in the bad and good case (e.g. RSA vs DHE).
Additionally the way you describe this, it seems like the problem occurs during a renegotiation. Perhaps renegotiation is disabled and you can enable it? It has been considered unsafe and can be configured generally in servers