We are using Cassandra database in production environment. We have a single cross colo cluster of 24 nodes meaning 12 nodes in PHX and 12 nodes in SLC colo. We have a replication factor of 4 which means 2 copies will be there in each datacenter.
Below is the way by which keyspace and column families have been created by our Production DBA's.
create keyspace profile with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = {slc:2,phx:2};
create column family PROFILE_USER with key_validation_class = 'UTF8Type' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and gc_grace = 86400;
We are running Cassandra 1.2.2 and it has org.apache.cassandra.dht.Murmur3Partitioner, with KeyCaching, SizeTieredCompactionStrategy and Virtual Nodes enabled as well. Cassandra nodes is deployed on HDD instead ofSSD's`.
I am using Astyanax client to read the data from Cassandra database using consistency level as ONE. I inserted 50 Millions records (total around 285GB of data across 24 nodes) in the production cluster using the Astyanax client and after the compaction is finished, I started doing read against the Cassandra production database.
Below is the code by which I am creating connection configuration using Astyanax client-
/**
* Creating Cassandra connection using Astyanax client
*
*/
private CassandraAstyanaxConnection() {
context = new AstyanaxContext.Builder()
.forCluster(ModelConstants.CLUSTER)
.forKeyspace(ModelConstants.KEYSPACE)
.withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
.setPort(9160)
.setMaxConnsPerHost(100)
.setSeeds("cdb03.vip.phx.host.com:9160,cdb04.vip.phx.host.com:9160")
.setLocalDatacenter("phx") //filtering out the nodes basis on data center
)
.withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
.setCqlVersion("3.0.0")
.setTargetCassandraVersion("1.2")
.setConnectionPoolType(ConnectionPoolType.ROUND_ROBIN)
.setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE))
.withConnectionPoolMonitor(new CountingConnectionPoolMonitor())
.buildKeyspace(ThriftFamilyFactory.getInstance());
context.start();
keyspace = context.getEntity();
emp_cf = ColumnFamily.newColumnFamily(
ModelConstants.COLUMN_FAMILY,
StringSerializer.get(),
StringSerializer.get());
}
Most of the time I am getting 95th percentile read performance around 8/9/10 ms.
I am trying to see is there any way I can get much better read performance with Cassandra database. I was in the impression that I will be getting 95th percentile as 1 or 2 ms but after doing some tests on the production cluster all my hypothesis went wrong. Ping time to Cassandra production nodes from where I am running my client program is 0.3ms average.
Below is the result I am getting.
Read Latency(95th Percentile) Number of Threads Duration the program was running(in minutes) Throughput(requests/seconds) Total number of id's requested Total number of columns requested
8 milliseconds 10 30 1584 2851481 52764072
Can anyone shed some light on what other things I can try it out to achieve good read latency performance? I know there might be similar people in my same situation as well who are using Cassandra in production. Any help will be appreciated.
Thanks for the help.