7
votes

I had set up a 6-node Cassandra cluster spanning two AWS regions / datacenters (3 in each) and everything was working fine. After getting that much working I attempted to enable internode encryption which I cannot get to work properly, despite reading innumerable documents on the subject and fiddling endlessly.

I don't see any errors or anything out of the ordinary in the logs. I do see the following line in the logs which indicates it has started the encrypted messaging service, as expected:

MessagingService.java:482 - Starting Encrypted Messaging Service on SSL port 7001

I have enabled verbose logging for SSL in cassandra-env.sh, however this does not produce any errors or additional information about SSL internode connections that I can see (update below):

JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl"

I can connect to from one node to all the others on the encrypted messaging port 7001 using nc, so there's no firewall issue.

ubuntu@ip-5-6-7-8:~$ nc -v 1.2.3.4 7001
Connection to 1.2.3.4 7001 port [tcp/afs3-callback] succeeded!

I can connect to each node locally using cqlsh (I haven't enabled client-server encryption) and can query the system keyspace, etc.

However, if I run nodetool status I see that the nodes cannot see each other. Only the node that I'm querying the cluster on is present in the list. This was not the case before internode encryption was enabled, they could all see each other just fine then.

ubuntu@ip-5-6-7-8:~$ nodetool status
Datacenter: us-east_A
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  1.2.3.4        144.75 KB  256          ?       992ae1bc-77e4-4ab1-a18f-4db62bb0ce6f  1b

My process was this:

  • Created a certificate authority for my cluster
  • Created a keystore and truststore for each node and added my CA certificate chain to both
  • Generated a key pair and CSR for each node, signed it with my CA, and added the resulting certificate to each node's keystore
  • Updated each node's configuration as reads below
  • Restarted all nodes

The server encryption configuration I'm using is this, with the appropriate values in the $variables.

server_encryption_options:
    internode_encryption: all
    keystore: $keystore_path
    keystore_password: $keystore_passwd
    truststore: $truststore_path
    truststore_password: $truststore_passwd
    require_client_auth: true
    protocol: TLS
    algorithm: SunX509
    store_type: JKS
    cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]

If anybody could offer some insight or a direction to look in it would be greatly appreciated.

Update: Cipher Suite Agreement

Apparently SSL debug logging prints to stdout, which is not logged to Cassandra's logfiles, so I didn't see that output before. Running Cassandra in the foreground I can see a ton of SSL errors tracing out, all of which complain of handshake failure, because:

javax.net.ssl.SSLHandshakeException: no cipher suites in common

In an attempt to solve this problem I have switched to the Oracle JRE (I was being lazy and using OpenJDK before) and installed the JCE unlimited strength cryptography policy files to ensure all possible ciphers would be supported.

It didn't fix anything.

This is especially confusing given that all these nodes are exactly identical: hardware, OS vendor and version, Java vendor and version, Cassandra version, and configuration file. I cannot imagine why they cannot agree on a cipher suite under these circumstances.

The following is the full error that is traced:

*** ClientHello, TLSv1.2
RandomCookie:  GMT: 1449074039 bytes = { 205, 93, 27, 38, 184, 219, 250, 8, 232, 46, 117, 84, 69, 53, 225, 16, 27, 31, 3, 7, 203, 16, 133, 156, 137, 231, 238, 39 }
Session ID:  {}
Cipher Suites: [TLS_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, TLS_EMPTY_RENEGOTIATION_INFO_SCSV]
Compression Methods:  { 0 }
***
%% Initialized:  [Session-3, SSL_NULL_WITH_NULL_NULL]
%% Invalidated:  [Session-3, SSL_NULL_WITH_NULL_NULL]
ACCEPT-/1.2.3.4, SEND TLSv1.2 ALERT:  fatal, description = handshake_failure
ACCEPT-/1.2.3.4, WRITE: TLSv1.2 Alert, length = 2
ACCEPT-/1.2.3.4, called closeSocket()
ACCEPT-/1.2.3.4, handling exception: javax.net.ssl.SSLHandshakeException: no cipher suites in common
ACCEPT-/1.2.3.4, called close()
ACCEPT-/1.2.3.4, called closeInternal(true)
INFO  16:33:59 Waiting for gossip to settle before accepting client requests...
Allow unsafe renegotiation: false
Allow legacy hello messages: true
Is initial handshake: true
Is secure renegotiation: false
ACCEPT-/1.2.3.4, setSoTimeout(10000) called
ACCEPT-/1.2.3.4, READ:  SSL v2, contentType = Handshake, translated length = 57
1
Try to remove require_client_auth, protocol, algorithm, store_type and cipher_suites settings and let cassandra use the defaults. Also try to use a wrong password for keystore_password and see if cassandra really throws an exception as expected. - Stefan Podkowinski
Yeah, I tried using invalid values for keystore/truststore paths and passwords and did get errors as expected. I had previously tried disabling require_client_auth but to no avail (in any case I need client auth to work for inter-dc messaging anyway). I had originally left the "advanced" parameters you mention commented out but eventually set them explicitly hoping that would somehow do something. - BWW
your IP addresses - are they internal IPs or external IPs? I'm thinking you may need to set broadcast_address to the external IPs if it's the latter. - LHWizard
I'm using private IPs for the 'listen_address' and public IPs for the 'broadcast_address', which I believe is the correct setup. - BWW

1 Answers

2
votes

After a great deal more poking and prodding I've finally managed to get this to work. The problem was related to certificates and the keystore.

As a result of these problems the SSL handshake would fail either due to certificate chain problems or cipher suite agreement problems. Cassandra rather unhelpfully discards errors related to SSL and logs nothing.

In any case, I managed to get things working by doing the following:

  • Ensure that the CA generates node certificates with both client and server key usage attributes. Failing to include one or the other will prevent nodes from authenticating to each other properly. This presents itself as the cipher suite agreement error. If you're using OpenSSL to manage your CA, I've included the -extensions configuration I used below.
  • Ensure that both the root and any intermediate CA certificates you are using (if you're using an intermediary CA) are imported into both the keystore and truststore.
  • Ensure that the node certificate imported into the keystore includes the full trust chain from the primary certificate down to the CA root, including any intermediaries – even though you have already imported these CA certificates separately into the keystore. Failing to do this presents itself as an invalid certificate chain errors.

OpenSSL CA Config

Here's my extensions section for dual-role client/server certificates. You can include this in your OpenSSL config file and reference it when signing by specifying -extensions dual_cert.

[ dual_cert ]
# Extensions for dual-role user/server certificates (`man x509v3_config`).
basicConstraints = CA:FALSE
nsCertType = client, server
nsComment = "Client/Server Dual-role Certificate"
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid,issuer:always
keyUsage = critical, nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = clientAuth, serverAuth

Creating a PEM containing the full trust chain

To create a single PEM file which contains the full trust chain for your node certificate, simply cat all the certificate files in reverse order from the node certificate down to the CA root.

cat node1.crt ca-intermediate.crt ca-root.crt > node1-full-chain.crt