0
votes

I've managed to setup a Cassandra cluster in Microsoft Azure. Currently the cluster contains 2 nodes on 2 VMs in Azure. I've been using OpsCenter to check the status of the cluster and everything seems promising. However I've created a simple C# test client for the cluster using the DataStax C# Driver to connect to the cluster which actually are working but REALLY REALLY slow.

static class SimpleClient
{
    private static Session _session;
    public static Session Session 
    { 
        get 
        { 
            return _session; 
        } 
    }

    private static Cluster _cluster;
    public static Cluster Cluster 
    { 
        get 
        { 
            return _cluster; 
        } 
    }

    public static void Connect(String node)
    {
        Console.WriteLine("Connecting to " + node);
        _cluster = Cluster.Builder().AddContactPoint(node).Build();
        _session = _cluster.Connect();
        Metadata metadata = _cluster.Metadata;
        Console.WriteLine("Connected to cluster: " + metadata.ClusterName.ToString());
    }

    public static void Close()
    {
        _cluster.Shutdown();
    }

    public static void CreateTable()
    {
        Console.WriteLine("Creating table with name test1");
        _session.Execute(" CREATE TABLE kstt.test1 ( identifier text PRIMARY KEY, name text ); ");
        Console.WriteLine("Table created with name test1");
    }

    public static void InsertToTable()
    {
        Console.WriteLine("Inserting data into test1");
        _session.Execute(" INSERT INTO kstt.test1 ( identifier, name ) VALUES ( '" + "hello" + "', '" + "man" + "' );");
        Console.WriteLine("Data inserted into test1");
    }

    public static void ReadFromTable(int times)
    {
        Console.WriteLine("Reading data from test1");
        for (int i = 0; i < times; i++)
        {
            RowSet results = _session.Execute(" SELECT * FROM kstt.test1; ");
            foreach (CqlColumn cqlColumn in results.Columns)
            {
                Console.WriteLine("Keyspace: " + cqlColumn.Keyspace + " # Table: " + cqlColumn.Table + " # Name: " + cqlColumn.Name); 
            }
        }
        Console.WriteLine("Data was read from test1");
    }

    public static void DropTable()
    {
        Console.WriteLine("Dropping table test1");
        try
        {
            _session.Execute(" DROP TABLE kstt.test1; ");
        }
        catch { }
        Console.WriteLine("Dropped table test1");
    }
}

This code actually do work. But it's extremly slow, taking about 10 seconds to connect and about 10 more seconds to execute a query. I think that it have something to do with the Load Balancer built in Azure among with the cassandra.yaml settings.

I've also noticed that the cluster are returning 2 IPs. One that is the external ip of the cluster and the other is the internal ip of one specific node which of course is unreachable from outside.

This is our setup:

Load Balancer on Port 9042

Load Balancer on Port 9160

cassandra-node1 with external ip 66.55.44.33 with internal ip 33.44.33.44

cassandra-node2 with external ip 66.55.44.33 with internal ip 11.22.11.22

Cassandra Yaml

Listen address for cassandra-node1: 33.44.33.44 RPC address for cassandra-node1: 33.44.33.44

Listen address for cassandra-node2: 11.22.11.22 RPC address for cassandra-node2: 11.22.11.22

Sometimes the program even ends up in a WriteTimeoutException when executing an query.

1
What's your ping time to the Cassandra nodes from wherever you're running the code? 10 seconds is obviously unreasonable, and I don't see anything wrong with your code.rs_atl

1 Answers

0
votes

While it's difficult to look into a performance issue based only on these details, here are a couple of questions/comments:

  1. where is your client running from?
  2. what's the ping time between the Cassandra nodes and also from the client machine to the C* nodes?
  3. you actually don't need a load balancer on port 9042 as the driver will be able to do load balancing by itself.
  4. normally you should see some improvement by using prepared statements