2
votes

I am recording realtime trade data with the Datastax Cassandra java driver. I have configured Cassandra with a single node, replication factor of 1, and consistency level ALL.

I frequently have writes which do not record, but do not fail. The java client does not throw any errors, and the async execute successful callback is called. Trace doesn't seem to show anything unusual:

[CassandraClient] - Adding to trades memtable on /10.0.0.118[SharedPool-Worker-1] at Mon Dec 22 22:54:04 UTC 2015

[CassandraClient] - Appending to commitlog on /10.0.0.118[SharedPool-Worker-1] at Mon Dec 22 22:54:04 UTC 2015

[CassandraClient] - Coordinator used /10.0.0.118

but, when I look at the data in the cassandra shell, notice the skipped IDs (ignoring bad dates):

cqlsh:keyspace> select * from trades where [...] order by date desc limit 10;

 date                     | id     | price  | volume
--------------------------+--------+--------+------------
 1970-01-17 19:00:19+0000 | 729286 | 435.96 |  3.4410000
 1970-01-17 19:00:19+0000 | 729284 | 436.00 | 17.4000000
 1970-01-17 19:00:19+0000 | 729283 | 436.00 |  0.1300000
 1970-01-17 19:00:19+0000 | 729277 | 436.45 |  5.6972000
 1970-01-17 19:00:19+0000 | 729276 | 436.44 |  1.0000000
 1970-01-17 19:00:19+0000 | 729275 | 436.44 |  0.9728478
 1970-01-17 19:00:19+0000 | 729274 | 436.43 |  0.0700070
 1970-01-17 19:00:19+0000 | 729273 | 436.45 |  0.0369260
 1970-01-17 19:00:19+0000 | 729272 | 436.43 |  1.0000000
 1970-01-17 19:00:19+0000 | 729271 | 436.43 |  1.0000000

why do some inserts silently fail? indications point to a timestamp issue, but I don't detect a pattern.

similar question: Cassandra - Write doesn't fail, but values aren't inserted

might be related to: Cassandra update fails silently with several nodes

1

1 Answers

1
votes

The fact that the writes succeed and some records are missing is a symptom that C* is overwriting the missing rows. The reason you may see such behavior is the misuse of bound statements.

Usually people prepare the statements with:

PreparedStatement ps = ...;
BoundStatement bs = ps.bind();

then they issue something like:

for (int i = 0; i < myHugeNumberOfRowsToInsert; i++) {
    session.executeAsync(bs.bind(xx));    
}

This will actually produce the weird behavior, because the bound statement is the same across most of the executeAsync calls, and if the loop is fast enough to enqueue (say) 6 queries before the driver fires the first query at all, all the submitted queries will share the same bound data. A simple fix is to actually issue different BoundStatement:

for (int i = 0; i < myHugeNumberOfRowsToInsert; i++) {
    session.executeAsync(new BoundStatement(ps).bind(xx));    
}

This will guarantee that each statement is unique and no overwrites are possible at all.