0
votes

I've been using Datastax provided Apache Cassandra (v2.x) for my project. I'm creating a ColumnFamily using Datasatx APIs as below:

//Create cluster
Cluster cluster = Cluster.builder().addContactPoint(hostNameOrIp)
//Get session
Session session = cluster.connect();
//create keyspace using session
session.execute(String.format("CREATE KEYSPACE IF NOT EXISTS %s WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': %d}",
            QueryBuilder.quote("MY_KS"), 
        1)
        );

String tableQuery = "CREATE TABLE timeline2 (
    key varchar,
    open float,
    high float,
    low float,
    close float,
    volume int,
    adjusted float,
    dtime timestamp,
    PRIMARY KEY (key, dtime)
)";

//create columnFamily using session
ResultSet result = session.execute(tableQuery);

I've now been asked to move from Datastax provided Cassandra to plain vanila flavor of Apache Cassandra (v2.x) and do the same stuff using Hector APIs.

However I've been unable to find similar APIs in the Hector. What I've done until now is as below:

Map<String, String> accessMap = new HashMap<String, String>();
            accessMap.put("username", username);
            accessMap.put("password", password);

            Cluster cluster = HFactory.getOrCreateCluster("TEST_CLUSTER", new CassandraHostConfigurator(cassandraUrl), accessMap);

        ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition("MY_KS", ComparatorType.BYTESTYPE);

        KeyspaceDefinition newKeyspaceDef = HFactory.createKeyspaceDefinition("MY_KS", ThriftKsDef.DEF_STRATEGY_CLASS, 1, Arrays.asList(cfDef));

        //Add the schema to the cluster.
        //"true" as the second param means that Hector will block until all nodes see the change.
        cassandraCluster.addKeyspace(newKeyspaceDef, true);

        Keyspace ksp = HFactory.createKeyspace("MY_KS", cassandraCluster);

I'm now stuck as this point. I cannot find APIs in Hector where I can give a simple query string to CREATE TABLE as was possible with Datastax APIs (i.e., by giving simple CQLs), I did explore various other options over internet but could not get a straightforward solution. One of the option I saw on Hector WIKI was using ColumnFamilyTemplate. Other option I saw was using BasicColumnDefinition. Another option I was was using Mutator.insert() operation.

But none of these solutions are clear enough as to how will I define the "datatype" of the columns of my table (a.k.a column family).

Moreover there isn't clear enough guidance/API details on what exactly are Serializers (StringSearlizer, etc.) and Comparators.

Can someone pls help me out on this ? My total objective is search for APIs in Hector that can take simple CQL query and execute them (as is possible with Datastax APIs).


@Alex Popescu

Thanks for clarifying, I now understand.

I've now modified my client as below:

   //This will give a connection to the cluster     
    Cluster cassandraCluster = connectApacheCassandra();

    ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition("TEST_KS", "TEST_CF",
    ComparatorType.BYTESTYPE);

    KeyspaceDefinition newKeyspaceDef = HFactory.createKeyspaceDefinition("TEST_KS",                ThriftKsDef.DEF_STRATEGY_CLASS, 1, Arrays.asList(cfDef));


            cassandraCluster.addKeyspace(newKeyspaceDef, true);

            Keyspace ksp = HFactory.createKeyspace("TEST_KS", cassandraCluster);        

            BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(cfDef);

            cassandraCluster.addKeyspace(newKeyspaceDef, true);

            Keyspace ksp = HFactory.createKeyspace("TEST_KS", cassandraCluster);        

            BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(cfDef);

        BasicColumnDefinition columnDefinition = new BasicColumnDefinition();
        columnDefinition.setName(StringSerializer.get().toByteBuffer("aKey"));
        columnDefinition.setIndexName("key_idx1");
        columnDefinition.setIndexType(ColumnIndexType.KEYS);
        columnDefinition.setValidationClass(ComparatorType.LONGTYPE.getClassName());
        columnFamilyDefinition.addColumnDefinition(columnDefinition);

        columnDefinition = new BasicColumnDefinition();
        columnDefinition.setName(StringSerializer.get().toByteBuffer("aTestColumn"));    
        columnDefinition.setValidationClass(ComparatorType.LONGTYPE.getClassName());
        columnFamilyDefinition.addColumnDefinition(columnDefinition);    

        cassandraCluster.updateColumnFamily(new ThriftCfDef(columnFamilyDefinition));

I now use cqlsh to see the output of query DESCRIBE COLUMNFAMILY "TEST_CF" and I get the following output:

CREATE TABLE "TEST_CF" (
  key blob,
  column1 blob,
  "614b6579" bigint,
  "6154657374436f6c756d6e" bigint,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='NONE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

I'm unable to understand this output. I do not see the columns "aKey" and "aColumn" in this output. How is the output showing column names as "key", "column1", etc (I never mentioned them in my code). Moreover I'm unable to understand the datatypes displayed in this output.

My expectation is to have an output something as below:

CREATE TABLE TEST_CF (
    aKey varchar,
    aColumn varchar
    PRIMARY KEY (aKey )
)";

Can you please point out where am I making mistake in the Hector API so that I do not get the expected output? Also, if I want the column datatype to be something other than varchar (say float); what change I should be doing in my code ?

2

2 Answers

1
votes

You have to use HFactory.createColumnFamilyDefinition(..). That defintion can be added to the cluster: cluster.addColumnFamily(columnFamilyDefinition).

Hector has some support for CQL, but I have not used it. Possibly you can use CQL to create your column families, too.

0
votes

The DataStax Java driver is using the CQL protocol (version 3), while Hector is using the Thrift API. You won't be able to run CQL queries (version 3) through Hector.

Extra: even if the underlying storage is the same, the ways data is stored as a result of using CQL v Thrift are not always compatible. You can learn more about these differences from this answer to Difference between Thrift and CQL 3 Columns/Rows