Using Cassandra and CQL3, how do you insert an entire wide row in a single request?

Question

I want to insert a single row with 50,000 columns into Cassandra 1.2.8. Before inserting, I have all the data for the entire row ready to go (in memory):

+---------+------+------+------+------+-------+
|         | 0    | 1    | 2    | ...  | 49999 |
| row_id  +------+------+------+------+-------+
|         | text | text | text | ...  | text  |
+---------+------+------+------|------+-------+

The column names are integers, allowing slicing for pagination. The column values are a value at that particular index.

CQL3 table definition:

create table results (
    row_id text,
    index int,
    value text,
    primary key (row_id, index)
) 
with compact storage;

As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible.

The only thing I can seem to find is to do execute the following 50,000 times:

INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);

the first ? is is an index counter (i) and the second ? is the text value to store at location i.

This takes a lot of time. Even when we put the above INSERTs into a batch, it takes a lot of time.

We have all the data we need (the complete row) in its entirety, I would assume it to be very easy to just say "here, Cassandra, store this data as a single row in one request", for example:

//EXAMPLE-BUT-INVALID CQL3 SYNTAX:
insert into results (row_id, (index,value)) values 
    ((0,text0), (1,text1), (2,text2), ..., (N,textN));

This example isn't possible via current CQL3 syntax, but I hope it illustrates the desired effect: everything would be inserted as a single query.

Is it possible to do this in CQL3 and the DataStax Java Driver? If not, I suppose I'll be forced to use Hector or the Astyanax driver and the Thrift batch_insert operation instead?

Have you tried using lists / sets / maps. For this case it should do the trick but, as Alex says, it would make an interesting addition to CQL3. — jorgebg
Yes, we have tried, and it was reasonably fast, but it completely breaks the desired data model: you cannot do slice queries on CQL3 collections. — Les Hazlewood

Mata Mata · Accepted Answer · 2013-08-30T12:51:26

Multiple INSERTs / UPDATEs can be done using batch_mutate method in Thrift APIs, by making use of mutation multi-maps.

Map<byte[], Map<String, List<Mutation>>> mutationMap = new HashMap<byte[], Map<String, List<Mutation>>>();

List<Mutation> mutationList = new ArrayList<Mutation>();

mutationList.add(mutation);
Map<String, List<Mutation>> m = new HashMap<String, List<Mutation>>();

m.put(columnFamily, mutationList);

mutationMap.put(key, m);
client.batch_mutate(mutationMap, ConsistencyLevel.ALL);

Using Cassandra and CQL3, how do you insert an entire wide row in a single request?

4 Answers