I want to insert a single row with 50,000 columns into Cassandra 1.2.8. Before inserting, I have all the data for the entire row ready to go (in memory):
+---------+------+------+------+------+-------+
| | 0 | 1 | 2 | ... | 49999 |
| row_id +------+------+------+------+-------+
| | text | text | text | ... | text |
+---------+------+------+------|------+-------+
The column names are integers, allowing slicing for pagination. The column values are a value at that particular index.
CQL3 table definition:
create table results (
row_id text,
index int,
value text,
primary key (row_id, index)
)
with compact storage;
As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible.
The only thing I can seem to find is to do execute the following 50,000 times:
INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
the first ?
is is an index counter (i
) and the second ?
is the text value to store at location i
.
This takes a lot of time. Even when we put the above INSERTs into a batch, it takes a lot of time.
We have all the data we need (the complete row) in its entirety, I would assume it to be very easy to just say "here, Cassandra, store this data as a single row in one request", for example:
//EXAMPLE-BUT-INVALID CQL3 SYNTAX:
insert into results (row_id, (index,value)) values
((0,text0), (1,text1), (2,text2), ..., (N,textN));
This example isn't possible via current CQL3 syntax, but I hope it illustrates the desired effect: everything would be inserted as a single query.
Is it possible to do this in CQL3 and the DataStax Java Driver? If not, I suppose I'll be forced to use Hector or the Astyanax driver and the Thrift batch_insert
operation instead?