I've been using Datastax provided Apache Cassandra (v2.x) for my project. I'm creating a ColumnFamily using Datasatx APIs as below:
//Create cluster
Cluster cluster = Cluster.builder().addContactPoint(hostNameOrIp)
//Get session
Session session = cluster.connect();
//create keyspace using session
session.execute(String.format("CREATE KEYSPACE IF NOT EXISTS %s WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': %d}",
QueryBuilder.quote("MY_KS"),
1)
);
String tableQuery = "CREATE TABLE timeline2 (
key varchar,
open float,
high float,
low float,
close float,
volume int,
adjusted float,
dtime timestamp,
PRIMARY KEY (key, dtime)
)";
//create columnFamily using session
ResultSet result = session.execute(tableQuery);
I've now been asked to move from Datastax provided Cassandra to plain vanila flavor of Apache Cassandra (v2.x) and do the same stuff using Hector APIs.
However I've been unable to find similar APIs in the Hector. What I've done until now is as below:
Map<String, String> accessMap = new HashMap<String, String>();
accessMap.put("username", username);
accessMap.put("password", password);
Cluster cluster = HFactory.getOrCreateCluster("TEST_CLUSTER", new CassandraHostConfigurator(cassandraUrl), accessMap);
ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition("MY_KS", ComparatorType.BYTESTYPE);
KeyspaceDefinition newKeyspaceDef = HFactory.createKeyspaceDefinition("MY_KS", ThriftKsDef.DEF_STRATEGY_CLASS, 1, Arrays.asList(cfDef));
//Add the schema to the cluster.
//"true" as the second param means that Hector will block until all nodes see the change.
cassandraCluster.addKeyspace(newKeyspaceDef, true);
Keyspace ksp = HFactory.createKeyspace("MY_KS", cassandraCluster);
I'm now stuck as this point. I cannot find APIs in Hector where I can give a simple query string to CREATE TABLE as was possible with Datastax APIs (i.e., by giving simple CQLs), I did explore various other options over internet but could not get a straightforward solution. One of the option I saw on Hector WIKI was using ColumnFamilyTemplate. Other option I saw was using BasicColumnDefinition. Another option I was was using Mutator.insert() operation.
But none of these solutions are clear enough as to how will I define the "datatype" of the columns of my table (a.k.a column family).
Moreover there isn't clear enough guidance/API details on what exactly are Serializers (StringSearlizer, etc.) and Comparators.
Can someone pls help me out on this ? My total objective is search for APIs in Hector that can take simple CQL query and execute them (as is possible with Datastax APIs).
@Alex Popescu
Thanks for clarifying, I now understand.
I've now modified my client as below:
//This will give a connection to the cluster
Cluster cassandraCluster = connectApacheCassandra();
ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition("TEST_KS", "TEST_CF",
ComparatorType.BYTESTYPE);
KeyspaceDefinition newKeyspaceDef = HFactory.createKeyspaceDefinition("TEST_KS", ThriftKsDef.DEF_STRATEGY_CLASS, 1, Arrays.asList(cfDef));
cassandraCluster.addKeyspace(newKeyspaceDef, true);
Keyspace ksp = HFactory.createKeyspace("TEST_KS", cassandraCluster);
BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(cfDef);
cassandraCluster.addKeyspace(newKeyspaceDef, true);
Keyspace ksp = HFactory.createKeyspace("TEST_KS", cassandraCluster);
BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(cfDef);
BasicColumnDefinition columnDefinition = new BasicColumnDefinition();
columnDefinition.setName(StringSerializer.get().toByteBuffer("aKey"));
columnDefinition.setIndexName("key_idx1");
columnDefinition.setIndexType(ColumnIndexType.KEYS);
columnDefinition.setValidationClass(ComparatorType.LONGTYPE.getClassName());
columnFamilyDefinition.addColumnDefinition(columnDefinition);
columnDefinition = new BasicColumnDefinition();
columnDefinition.setName(StringSerializer.get().toByteBuffer("aTestColumn"));
columnDefinition.setValidationClass(ComparatorType.LONGTYPE.getClassName());
columnFamilyDefinition.addColumnDefinition(columnDefinition);
cassandraCluster.updateColumnFamily(new ThriftCfDef(columnFamilyDefinition));
I now use cqlsh to see the output of query DESCRIBE COLUMNFAMILY "TEST_CF" and I get the following output:
CREATE TABLE "TEST_CF" (
key blob,
column1 blob,
"614b6579" bigint,
"6154657374436f6c756d6e" bigint,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=1.000000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='NONE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
I'm unable to understand this output. I do not see the columns "aKey" and "aColumn" in this output. How is the output showing column names as "key", "column1", etc (I never mentioned them in my code). Moreover I'm unable to understand the datatypes displayed in this output.
My expectation is to have an output something as below:
CREATE TABLE TEST_CF (
aKey varchar,
aColumn varchar
PRIMARY KEY (aKey )
)";
Can you please point out where am I making mistake in the Hector API so that I do not get the expected output? Also, if I want the column datatype to be something other than varchar (say float); what change I should be doing in my code ?