Cassandra CQL : CSV insert invert columns

Question

I am using Cassandra 3.9.

Let's say I have a simple table declared as follow :

CREATE KEYSPACE data WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

CREATE TABLE data.test (
  field1 text,
  time timestamp,
  field2 text,
  field3 text,
  PRIMARY KEY (field1, time, field3)
);

Using CQL, I want to insert content from a csv file. Here is an input sample of the file :

field1,time,field2,field3
"A",2018-01-01T19:55:00.000Z,0.001,"AAA"
"B",2018-01-01T19:55:00.000Z,0.5,"BBB"
"C",2018-01-01T19:55:00.000Z,0.081,"CCC"
"D",2018-01-01T19:55:00.000Z,0.11,"DDD"
"E",2018-01-01T19:55:00.000Z,0.22,"EEE"

And I insert my data in my table using the command COPY :

COPY data.test FROM 'FILEPATH\file.csv' WITH HEADER = TRUE ;

The data are inserted in my table, but the contents of columns 'field2', and 'field3' are inverted.

cqlsh:data> select * from test;

 field1 | time                            | field3 | field2
--------+---------------------------------+--------+--------
      C | 2018-01-01 04:00:00.000000+0000 |  0.081 |    CCC
      B | 2018-01-01 04:00:00.000000+0000 |    0.5 |    BBB
      A | 2018-01-01 04:00:00.000000+0000 |  0.001 |    AAA
      E | 2018-01-01 04:00:00.000000+0000 |   0.22 |    EEE
      D | 2018-01-01 04:00:00.000000+0000 |   0.11 |    DDD

(5 rows)

I can't find out why I have this (strange) behavior. The header on my CSV file are corrects and should be enough to fit in my table correctly.

EDIT :

Following Jan answer, I modified my insert command like that :

COPY data.test (field1,time,field3,field2) FROM 'FILEPATH\file.csv' WITH HEADER = TRUE ;

But this gave me the same result. In the other hand, when I specified my columns names in the order they are present in the file (and not the order of primary key), it gave me the correct result :

cqlsh:data> COPY data.test (field1,time,field2,field3) FROM 'FILEPATH\file.csv' WITH HEADER = TRUE ;

cqlsh:data> select * from test;

 field1 | time                            | field3 | field2
--------+---------------------------------+--------+--------
      C | 2018-01-01 04:00:00.000000+0000 |    CCC |  0.081
      B | 2018-01-01 04:00:00.000000+0000 |    BBB |    0.5
      A | 2018-01-01 04:00:00.000000+0000 |    AAA |  0.001
      E | 2018-01-01 04:00:00.000000+0000 |    EEE |   0.22
      D | 2018-01-01 04:00:00.000000+0000 |    DDD |   0.11

(5 rows)

It is not a duplicate of this question. In my case, it's not the order of the columns that is changed, but the fact that the content of the column 'field2' is in the column 'field3' and the content of the column 'field3' is in the column 'field2' — Nakeuh

JayK JayK · Accepted Answer · 2018-08-21T12:56:00

Currently, the only thing that HEADER=true does is tell cqlsh whether or not to ignore the first line.

The documentation on datastax seems to be a bit misguiding in that regard (I'll let them know), however running help COPY in cqlsh will give the following output:

HEADER=false            - whether to ignore the first line

There is currently a ticket on improving it to read from this header instead which can be found here.

Cassandra CQL : CSV insert invert columns

1 Answers