Cassandra Result Not Responding Correct Rows in Tables

Question

My Cassandra DB not responding as expected Row result. please see the below details of my Cassandra keyspace creation and to query of Count(*)

Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra
3.11.0 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. cqlsh> CREATE KEYSPACE key1 WITH replication = {'class':'SimpleStrategy', 'replicationfactor' : 1};

cqlsh> CREATE TABLE Key.Transcation_CompleteMall (i text, i1 text static, i2 bigint , i3 int static, i4 decimal static, i5 bigint static, i6 decimal static, i7 decimal static, PRIMARY KEY ((i),i1));


cqlsh> COPY Key1.CompleteMall (i,i1,i2,i3,i4,i5,i6,i7) FROM '/home/gpadmin/all.csv' WITH HEADER = TRUE; Using 16 child processes

Starting copy of Key1.completemall with columns [i, i1, i2, i3, i4, i5, i6, i7]. Processed: 25461792 rows; Rate:   15162 rows/s; Avg. rate:   54681 rows/s
> **bold**25461792  rows imported from 1 files in 7 minutes and 45.642 seconds (0 skipped).

cqlsh> select count(*) from Key1.transcation_completemall; OperationTimedOut: errors={'127.0.0.1': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=127.0.0.1 cqlsh> exit


[gpadmin@hmaster ~]$ cqlsh --request-timeout=3600
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.


cqlsh> select count(*) from starhub.transcation_completemall;

 count
---------
 **bold**2865767

(1 rows)

Warnings :
Aggregation query used without partition key

cqlsh>

I got only 2865767 rows but Copy command shows that 25461792 Rows accepted Cassandra. all.csv file has 2.5G size. For evaluating I exported the table to another file test.csv file which file size I wondered it became 252Mb.

My question is that, is Cassandra will automatically remove the duplicate in a row ? If yes how the Cassandra delete the duplicate in a table? Like primary Key repetition or Partition Key or like exact field duplication?

or

What would be the possibility that data get Loss

Expected your valuable suggestion

Advance Thanks to you all

undefined_variable undefined_variable · Accepted Answer · 2017-08-31T11:05:03

Cassandra will overwrite data with same primary key (Ideally all database will not have duplicate values for primary key(some throws constraint error,while some overwrites data)).

Example:

CREATE TABLE test(id int,id1 int,name text,PRIMARY KEY(id,id1))

INSERT INTO test(id,id1,name) VALUES(1,2,'test');
INSERT INTO test(id,id1,name) VALUES(1,1,'test1');
INSERT INTO test(id,id1,name) VALUES(1,2,'test2');
INSERT INTO test(id,id1,name) VALUES(1,1,'test1');

SELECT * FROM test;
 -----------------
|id  |id1  |name  |
 -----------------
|1   |2    |test2 |
 -----------------
|1   |1    |test1 |
 -----------------

The above statement will have only 2 records in table one with primary key (1,1) and other with primary key(1,2).

So in your case if values of i and i1 have duplicates that data will be overwritten.

Cassandra Result Not Responding Correct Rows in Tables

2 Answers