6
votes

Now i'm learning Cassandra, so i got a table without primary key. But it has some indexes.

So this is my doubt, can i create a table without primary key.?

CREATE TABLE subscription (subscriberid varchar,productid varchar,panaccessproductid varchar,operatorproductid varchar,price float,fallback varchar,paymenttype varchar,operatorid varchar,subscriptiontype varchar,expiry timestamp,subscriptionstatus varchar,created timestamp);

There is no primarykey and subscriberid,productid,operatorid and subscriptiontype are indexes. Is this possible?

From the documentation

Primary Key:: A primary key identifies the location and order of data storage. The primary key is defined at table creation time and cannot be altered. If the primary key must be changed, a new table schema is created and the data is written to the new table. Cassandra is a partition row store, and a component of the primary key, the partition key, identifies which node will hold a particular table row. At the minimum, the primary key must consist of a partition key. Composite partition keys can split a data set so that related data is stored on separate partitions. Compound primary keys include clustering columns which order the data on a partition. The definition of a table's primary key is critical in Cassandra. Carefully model how data in a table will be inserted and retrieved before choosing which columns will define the primary key. The size of the partitions, the order of the data within partitions, the distribution of the partitions amongst the nodes of the cluster - all of these considerations determine selection of the best primary key for a table.

3

3 Answers

8
votes

Plain answer is no, primary key is mandatory

4
votes

Cassandra is not a relational DB. Using indexes in the way you're intended to use indexes does not work well in Cassandra. The primary reason this is true is that Cassandra is designed for a use case where you have dozens, hundreds, or thousands of servers in a cluster - it uses the first part of the primary key (the partition key) to determine which servers own that data. Cassandra's secondary indexes (which you mention wanting to use) are node-local - to use those, Cassandra would have to ask every server in the cluster for the query, multiplying the impact of the query by every node in the cluster.

Therefore, rather than creating a table with indexes on subscriberid, productid, operatorid, and subscriptiontype, you would make 4 tables, one per index, where the partition key is either subscriberid, productid, operatorid, or subscriptiontype. When you query, cassandra will know exactly which server owns the data, and save asking the rest of the cluster.

Yes, this does duplicate a lot of data - this is called denormalization, and is common in Cassandra.

In future versions (3.4 and higher), you'll be able to use "SASI", a new form of Cassandra indexes that may help your use case significantly, with far less denormalization required.

3
votes

You can't create a table in Cassandra without a primary key, But still if you want to save your data you can add an additional column to your table (let say "pk") with data type UUID.

Example:

CREATE TABLE subscription (pk uuid PRIMARY KEY, subscriberid varchar,productid varchar,panaccessproductid varchar,operatorproductid varchar,price float,fallback varchar,paymenttype varchar,operatorid varchar,subscriptiontype varchar,expiry timestamp,subscriptionstatus varchar,created timestamp);

and can insert data like:

INSERT INTO subscription(pk, subscriberid,...) VALUES(uuid(), 'S123',...);